Picard Markduplicates Problem
1
0
Entering edit mode
7.0 years ago
xd_d ▴ 110

Hello,

here is my problem ! I run picard MarkDuplicates with REMOVE_DUPLICATES=true. And I thought that remove pcr and optical duplicates. But it doesn't work. I dont know what the problem is. But this are duplicates or ? samtools rmdup remove these duplicates.. It seems that picard cluster the duplicates but don't remove these duplicates..

Is this a bug or what is wrong ?

FCC0WRYACXX:1:1212:8677:81169   99  1   14405   1   100M    =   14489   183 TTCTGCTCAGTTCTTTATTGATTGGTGTGCCGTTTTCTCTGGAAGCCTCTTAAGAACACAGTGGCGCAGGCTGGGTGGAGCCGTCCCCCCATGGAGCACA    @CCFFFFFHHHHDGIIJJJGCDJGDEEHIDGHCAGGIIEFHHCFHHCFHIGHIIJGGEHEHGHJIIGEAB6BCCDCB;<BCBB?BDBDDDDDCC>ACBBD    PG:Z:MarkDuplicates NH:i:4  HI:i:1  nM:i:0  AS:i:197

FCC0WRYACXX:1:1213:5301:23125   419 1   14405   1   100M    =   14492   187 TTCTGCTCAGTTCTTTATTGATTGGTGTGCCGTTTTCTCTGGAAGCCTCTTAAGAACACAGTGGCGCAGGCTGGGTGGAGCCGTCCCCCCATGGAGCACA    @BCFDEFFHHFHFIIJIIJJIIIGGFHHHGIEFHGJJIJGGGGIGHHGJJEFFHEGIGHHIFCFGGIHHFFFEDC5;?BDD@DBDDDDDBB?9>CB?BBC    PG:Z:MarkDuplicates NH:i:4  HI:i:3  nM:i:0  AS:i:198

FCC0WRYACXX:1:1308:4376:65126   99  1   14405   1   100M    =   14491   186 TTCTGCTCAGTTCTTTATTGATTGGTGTGCCGTTTTCTCTGGAAGCCTCTTAAGAACACAGTGGCGCAGGCTGGGTGGAGCCGTCCCCCCATGGAGCCCA    @@@FFFFFHHHHDIHIEFGFHIIIIAHFDHHHHIIJGIIIJIGIGIJJJJJJGIJJJIIIJ@CHGGAEAB<>ABD??B5?CDBB5<>ABD9CCD:<C(<A    PG:Z:MarkDuplicates NH:i:4  HI:i:1  nM:i:1  AS:i:196
Picard Duplicates • 4.1k views
ADD COMMENT
0
Entering edit mode

maybe picard flag unmapped duplicates but dont remove them ?

ADD REPLY
0
Entering edit mode

If the mate is mapped at a different location, a pair is not a duplicate...

ADD REPLY
0
Entering edit mode

I had reading that a mate paire is a duplicate if they have the same start position and they have all the same start position

ADD REPLY
0
Entering edit mode

The mate position is different, though. So they are not duplicates and you would not want to remove them.

ADD REPLY
2
Entering edit mode
7.0 years ago
BioinfGuru ★ 1.7k

First: It is better to read the documentation before posting a question

Second: Stick to picard markduplicates for this, not sam remdup

Third: This is what you are looking for...

https://broadinstitute.github.io/picard/command-line-overview.html

"The program can take either coordinate-sorted or query-sorted inputs, however the behavior is slightly different. When the input is coordinate-sorted, unmapped mates of mapped records and supplementary/secondary alignments are not marked as duplicates. However, when the input is query-sorted (actually query-grouped), then unmapped mates and secondary/supplementary reads are not excluded from the duplication test and can be marked as duplicate reads."

Your result is dependent on: 1) how you sort 2) whether you want to keep or remove unmapped mates

I would suggest sorting by coordinate and removing unmapped mates before running picard mark duplicates.

How to do this?..... read about SAM flags and use the command line.

ADD COMMENT

Login before adding your answer.

Traffic: 2460 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6