Maximum Depth Filter
1
0
Entering edit mode
6.7 years ago
ATpoint 81k

I would like to filter VCF files with a maximum depth filter, so removing variants with a coverage more than a threshold (in this case d+3*sqrt(d), d=mean coverage of the file), as suggested in Heng Li's paper from 2014. I was thinking about a custom script, including something like sambamba/samtools depth, but maybe there is an implementation somewhere that I missed which can do this task much more efficiently? Any ideas?

VCF Filter Maximum Depth • 2.5k views
ADD COMMENT
0
Entering edit mode
6.7 years ago

assuming all variants have a DP field get mean DP using http://lindenb.github.io/jvarkit/BioAlcidaeJdk.html

java -jar dist/bioalcidaejdk.jar -e 'double dp=0.0;int n=0;while(iter.hasNext()) {dp+=iter.next().getAttributeAsInt("DP",0);n++;} println(dp/n); ' input.vcf
188.22544283413848

and then filter using vcfilterjdk http://lindenb.github.io/jvarkit/VcfFilterJdk.html

 java -jar dist/vcffilterjdk.jar -e 'final double dp= 188.22544283413848;final double treshold = dp + 3 * Math.sqrt(dp);  return variant.getAttributeAsInt("DP",0) > treshold;'input.vcf
ADD COMMENT
0
Entering edit mode

Thanks! What about mean depth of the entire WGS BAM instead of the mean of the DPs? Any ideas about that?

ADD REPLY
0
Entering edit mode

use whatever you want.

ADD REPLY
0
Entering edit mode

That would've been my next step ^^ Probably fetching a subsample of the WGS file, maybe all promoters, and then running any depth tool on it should be sufficient to get a solid estimate ov average depth.

ADD REPLY

Login before adding your answer.

Traffic: 2176 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6