Biostar Beta. Not for public use.
Extract reads matching variants from VCF file
0
Entering edit mode
13 months ago
sppearce • 10
United Kingdom

I have a vcf file describing a set of SNPs, insertions and deletions. I would like to extract those reads from the bam file which correspond to these variants for some quality control.

For SNPs, I can loop through using samjs and produce one new bam file per SNP, but I don't know how to generalise this for indels (and I want those specific ones that match the vcf file, not just any indel at that base).

Is there any existing tool that will let me do this? Or a way to get samjs to do this.

snp vcf • 145 views
ADD COMMENTlink
1
Entering edit mode

it's the same thing, you 'just' have to test for the cigar elements 'D/N' or 'I'.

ADD REPLYlink
0
Entering edit mode

That presumably would get me any indel that exists covering the start position, rather than the specific indel in the vcf (I realise that is probably pretty niche).

ADD REPLYlink
1
Entering edit mode

CIGAR string does not only contain the operations (D, I, etc) but also the length of the operation. By comparing the length with the indel size in VCF, maybe you would be able to count the reads that support the specific indel in question. I usually use pysam for this kind of CIGAR queries.

https://pysam.readthedocs.io/en/latest/

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1