Biostar Test Site

This is site is used for testing only. Visit: https://www.biostars.org to ask a question.

> 90% samples, DP > 10
2
0
Entering edit mode
12 days ago

Hello,

I have a number of single-sample-VCF-files and also got for each of them a bed file, consisting of the regions with DP > 10.

What I now want to get is a new bed file, which consists only of regions where 90% of the samples have DP > 10 based on those single bed files. So basically I want a bed file consisting of regions where at least 90% of the primary bed files overlap.

I already know how to work with Granges in R, however I do not know, how to solve the "90%"-thing.

Any suggestions?

Andreas

VCF R DP • 173 views
0
Entering edit mode

You may want to explore the filter options in bcftools view - that would mean filtering the VCF files on the command line and not in R.

0
Entering edit mode
12 days ago
seidel 7.8k

If you have GRanges objects representing regions for each sample, you could call coverage on each of these and then since coverage objects can be added together (cov3 = cov1 + cov2) you could simply add the coverages (you could do this in a loop). After adding all the coverages, you can then simply slice() the coverage object at a depth equal to 90% of your samples. For instance, if you have 10 samples, slice at a depth of 9 to find regions covered by at least 9 samples (90% of your samples). In case individual samples have overlapping regions, you could first reduce them prior to taking coverage(), so that any given sample will only produce a depth of 1 on the genome.

0
Entering edit mode
11 days ago

convert each VCF file to bed with bcftools query -f '%CHROM\t%POS0\t%END\n'

use bedtools multiinter to get the overlapping intervals and filter the output with awk for the '90%'.