Biostar Beta. Not for public use.
Do you recommend marking duplicates for WGS or WES?
1
Entering edit mode
13 months ago
jpuntomarcos • 30

Hi,

The purpose of removing duplicates is to mitigate the effects of PCR amplification bias. However, this step can lead to removing reads that were not a consequence of PCR amplification, so removing important info. Some works suggest that duplicate removal is not necessary because the impact of doing so is minimal when calling variants (see link).

What do you recommend? Do you know any paper or work which points benefits of doing duplicate removal?

ADD COMMENTlink
1
Entering edit mode

Yes, do it as part of your standard pipeline. PCR bias is what it is, a bias and therefore mostly undirected and not reproducible for a given DNA fragment. For this reason I somewhat doubt that a single study as the one you linked can comprehensively give advice on that matter. PCR bias might be present or not depending on the sample prep method and the polymerase used. I always mark duplicates with samblaster like aligner (...) | samblaster --ignoreUnmated | samtools view -o out.bam. You are free to use any tool of your choice.

ADD REPLYlink
0
Entering edit mode

Sorry, I have a bunch of .bam files likely marked duplicates, how I can check if duplicates already have been removed or just have been marked waiting for removal?

ADD REPLYlink
2
Entering edit mode

You don't need to remove them. Any proper variant caller (or general NGS piece of software) will ignore them. If you still want to see if they have been removed, you could take a subset of the files and rerun any duplicate detection software. Typically these output a summary of how many reads were duplicated.

ADD REPLYlink
1
Entering edit mode

Ideally they are marked with GATK and those information is already taken care by a standard variant caller.

ADD REPLYlink
0
Entering edit mode
11 months ago
Duarte, CA

I expect it be more important for Exomes than WGS.

For variant calling, marking them may be OK. However, I admittedly would usually output a .bam file with duplicates removed, so I know what I am visualizing in IGV.

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1