Do you recommend marking duplicates for WGS or WES?
1
1
Entering edit mode
5.1 years ago
jpuntomarcos ▴ 50

Hi,

The purpose of removing duplicates is to mitigate the effects of PCR amplification bias. However, this step can lead to removing reads that were not a consequence of PCR amplification, so removing important info. Some works suggest that duplicate removal is not necessary because the impact of doing so is minimal when calling variants (see link).

What do you recommend? Do you know any paper or work which points benefits of doing duplicate removal?

WGS duplicate picard rmdup NGS • 3.4k views
ADD COMMENT
1
Entering edit mode

Yes, do it as part of your standard pipeline. PCR bias is what it is, a bias and therefore mostly undirected and not reproducible for a given DNA fragment. For this reason I somewhat doubt that a single study as the one you linked can comprehensively give advice on that matter. PCR bias might be present or not depending on the sample prep method and the polymerase used. I always mark duplicates with samblaster like aligner (...) | samblaster --ignoreUnmated | samtools view -o out.bam. You are free to use any tool of your choice.

ADD REPLY
0
Entering edit mode

Sorry, I have a bunch of .bam files likely marked duplicates, how I can check if duplicates already have been removed or just have been marked waiting for removal?

ADD REPLY
2
Entering edit mode

You don't need to remove them. Any proper variant caller (or general NGS piece of software) will ignore them. If you still want to see if they have been removed, you could take a subset of the files and rerun any duplicate detection software. Typically these output a summary of how many reads were duplicated.

ADD REPLY
1
Entering edit mode

Ideally they are marked with GATK and those information is already taken care by a standard variant caller.

ADD REPLY
0
Entering edit mode
5.1 years ago

I expect it be more important for Exomes than WGS.

For variant calling, marking them may be OK. However, I admittedly would usually output a .bam file with duplicates removed, so I know what I am visualizing in IGV.

ADD COMMENT

Login before adding your answer.

Traffic: 1940 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6