Where to find PCR duplicate reads in bam file?
1
0
Entering edit mode
7.9 years ago

Dear All,

I am trying to find the PCR duplicate reads from the bam/sam. If we use picard we can mark duplicates in the bam/sam file.

How to see the marked duplicates in the bam file. I tried checking the sam flag "1024" which decodes to "read is PCR or optical duplicate".

$ samtools flagstat Sample_WES01/WES01.clean.dedup.recal.bam

71753231 + 0 in total (QC-passed reads + QC-failed reads)

12384215 + 0 duplicates

71185962 + 0 mapped (99.21%:-nan%)

71753231 + 0 paired in sequencing

35881695 + 0 read1

35871536 + 0 read2

70159253 + 0 properly paired (97.78%:-nan%)

70654767 + 0 with itself and mate mapped

531195 + 0 singletons (0.74%:-nan%)

425872 + 0 with mate mapped to a different car

287474 + 0 with mate mapped to a different chr (mapQ>=5)

$

I extracted the flag column from bam file and tried grep'ing "1024". I couldn't see any matches.

Will I be able to see duplicate reads in IGV?

bam dnaseq bwa bowtie2 • 16k views
ADD COMMENT
8
Entering edit mode
7.9 years ago

How to see the marked duplicates in the bam file. I tried checking the sam flag "1024" which decodes to "read is PCR or optical duplicate".

$ samtools flagstat Sample_WES01/WES01.clean.dedup.recal.bam

here:

12384215 ( (QC-passed reads) + 0 duplicates (QC-failed reads)

I extracted the flag column from bam file and tried grep'ing "1024". I couldn't see any matches.

because i'ts a bit field https://en.wikipedia.org/wiki/Bit_field

you can get those reads using the option -f (required flag) of samtools view

samtools view -f 1024 in.bam

Will I be able to see duplicate reads in IGV?

There is an option in the IGV preferences to show the dup reads.:

"Filter duplicate reads: Clear to display alignments marked as duplicate reads. In DNA-Seq alignments these PCR or optical duplicates are often marked and filtered. In RNA-Seq alignments considerations differ."

http://www.broadinstitute.org/igv/Preferences

ADD COMMENT
0
Entering edit mode

Awesome Pierre :).

I tried "samtools view -f 1024 in.bam". Then I extracted the unique flag from the bam. I got the following 16 flags.

"1089,1097,1105,1107,1121,1123,1137,1145,1153,1161,1169,1171,1185,1187,1201,1209"

I checked all the above flags and they are tagged to "read is PCR or optical duplicate" in addition to other property.

HISEQ:137:C6W39ACXX:7:1314:15234:3404 1123 chrM 1 15 57S44M = 41 141 TCAGGGCCATAAAG HISEQ:137:C6W39ACXX:7:1314:15234:3404 1171 chrM 41 60 101M = 1 -141 CTCCATGCATTTGGT HISEQ:137:C6W39ACXX:7:2102:20584:10431 1187 chrM 1 60 11S90M = 112 212 ACATCACGATGGATCA HISEQ:137:C6W39ACXX:7:1113:11949:62990 1209 chrM 10 60 101M = 10 0 TCTATCACCCTATTAAC HISEQ:137:C6W39ACXX:7:2311:11970:3501 1169 chrM 15 60 101M = 16193 16079 CACCCTATTAACCAC

ADD REPLY

Login before adding your answer.

Traffic: 1511 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6