Question about deduplication with UMI-tools
1
0
Entering edit mode
4.8 years ago
Chadi Saad ▴ 100

Hello,

I deduplicated a BAM file using umitools.

Before i had : 98186207
after deduplication, I obtain : 58930293

so the ratio is before/after = 1.6

When I calculate the coverage by BED feature before and after dedup, I obtain a ratio of 6 !

I don't understand this difference betwen the coverage and the read number

sequencing umi deduplication • 1.1k views
ADD COMMENT
0
Entering edit mode

You should add how exactly you calculated things (command lines).

ADD REPLY
0
Entering edit mode
4.8 years ago

Depending on how you are doing your deduplication, this could be due to your deduplicated reads being concentrated in your BED features. For example, if this were RNAseq, you find, that on average only 1/2 to 2/3 of your reads will map to annotated transcripted regions, with the rest being intronic reads, transcript-noise, DNA contamination etc. The signal is much higher in genes, but the non-geneic reads still account for a large fraction of the whole. If reads in genes are more highly duplicated then you will see a bigger change in the genic regions than outside them.

Also remember that if your data is pair, UMI-Tools will report the number of pairs input, but if your pairs are overlapping, then they will contribute 2 to the coverage.

ADD COMMENT

Login before adding your answer.

Traffic: 1713 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6