Extracting Specific Reads From Vcf File
4
2
Entering edit mode
11.8 years ago
Bnfoguy ▴ 70

Hello,

I am trying to extract indels with coverage ~1 and they are in a VCF file. How do I write a python script or a shell command to extract those indels with coverage 1 only?

This is the data:

chr1    1548545 .       T       TAGATCG 22.5    .       INDEL;DP=2;VDB=0.0251;AF1=1;AC1=2;DP4=0,0,1,0;MQ=60;FQ=-37.5    GT:PL:GQ        1/1:60,3,0:6
chr1    1573532 .       C       CT      4.42    .       INDEL;DP=1;AF1=1;AC1=2;DP4=0,0,0,1;MQ=44;FQ=-37.5       GT:PL:GQ        0/1:40,3,0:3
chr1    1577221 .       AAC     AACAAAC 81.2    .       INDEL;DP=2;VDB=0.0201;AF1=1;AC1=2;DP4=0,0,1,1;MQ=60;FQ=-40.5    GT:PL:GQ        1/1:120,6,0:10
chr1    1577223 .       C       CAAAC   81.2    .       INDEL;DP=2;VDB=0.0185;AF1=1;AC1=2;DP4=0,0,1,1;MQ=60;FQ=-40.5    GT:PL:GQ        1/1:120,6,0:10
chr1    1584063 .       T       TT      3.25    .       INDEL;DP=1;AF1=1;AC1=2;DP4=0,0,1,0;MQ=60;FQ=-37.5       GT:PL:GQ        0/1:38,3,0:4
chr1    1635699 .       AA      A       3.66    .       INDEL;DP=2;VDB=0.0231;AF1=1;AC1=2;DP4=0,0,1,1;MQ=20;FQ=-40.5    GT:PL:GQ        1/1:40,6,0:4
chr1    1637760 .       T       TTTCTTT 22.5    .       INDEL;DP=1;AF1=1;AC1=2;DP4=0,0,0,1;MQ=60;FQ=-37.5       GT:PL:GQ        1/1:60,3,0:6

Thank you for your help!

vcf python • 5.1k views
ADD COMMENT
2
Entering edit mode
11.8 years ago

something like

awk -F '      '       '(!($4 ~ /[ATGC]/ && $5 ~/[ATGC]/ ))' < your.vcf | grep "DP=1;"

or better

grep INDEL < your.vcf | grep "DP=1;"
ADD COMMENT
0
Entering edit mode

Thanks Mr. Lindenbaum. You are a lifesaver!

ADD REPLY
0
Entering edit mode
11.7 years ago

Alternatively: awk '/INDEL/ && /DP=1/' your.vcf

ADD COMMENT
0
Entering edit mode
11.7 years ago
bioinfo ▴ 830

I prefer ( I assume the vcf file contains just Indels not SNPs)

more indel.vcf | grep "DP=1" &gt > filtered.indel.vcf

If you have SNPs and indel together in the same vcf file then

more indel.vcf | grep "INDEL" | grep "DP=1" &gt > filtered.indel.vcf
ADD COMMENT
0
Entering edit mode
6.9 years ago
Begali • 0

hi

I am new for tools such as samtools, bcftools, BWA and GATK and I am working on RADSeq I reach for point call variants which include ( INDEL and SNP) now time to filter them I could not do that by GATK with VQSR which reuired resources bundle folder which are not available for my organisms plant ( 5 samples for Cardamine hirsuta and 197 samples for Arabidopsis thaliana ) so only can do Hard-filtering however I need to plot my data to see my original distribution so I can determine thresholds which need to be filtered by considering lower est thresholds ... so I need to grep only INFO col first then how to select DP ,VDB ;AF1;MQ and so one so I can obtain file for each one which will be easier to plot then in R ...

Thanks in advance

ADD COMMENT

Login before adding your answer.

Traffic: 2478 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6