Question

Filtering info column

0

Entering edit mode

5.8 years ago

taijc06 • 0

Hi all,

I have a vcf file with an info column like this:

##fileformat=VCFv4.3
##fileDate=20180421
##source=PLINKv2.00
##filedate=20180410
##contig=<ID=10,length=135524727>
##INFO= ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Ensembl VEP. Format: Allele|Consequence|IMPACT|SYMBOL|Gene|Feature_type|Feature|BIOTYPE|EXON|INTRON|HGVSc|HGVSp|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|DISTANCE|STRAND|FLAGS|VARIANT_CLASS|SYMBOL_SOURCE|HGNC_ID|CANONICAL|TSL|APPRIS|CCDS|ENSP|SWISSPROT|TREMBL|UNIPARC|REFSEQ_MATCH|SOURCE|GENE_PHENO|SIFT|PolyPhen|DOMAINS|miRNA|AF|AFR_AF|AMR_AF|EAS_AF|EUR_AF|SAS_A F|AA_AF|EA_AF|gnomAD_AF|gnomAD_AFR_AF|gnomAD_AMR_AF|gnomAD_ASJ_AF|gnomAD_EAS_AF|gnomAD_FIN_AF|gnomAD_NFE_AF|gnomAD_OTH_AF|gnomAD_SAS_AF|MAX_AF|MAX_AF_POPS|CLIN_SIG|SOMATIC|PHENO|PUBMED|MOTIF_NAME|MOTIF_POS|HIGH_INF_POS|MOTIF_SCORE_CHANGE"
#CHROM  POS     ID      REF     ALT

I would like to only obtain the allele frequency (AF) data from the column. However, it is quite difficult for me to do so as all the data are clustered as one column. Are there any ways for me to overcome this? Thank you

sequencing vcf • 2.2k views

ADD COMMENT • link updated 5.8 years ago by finswimmer 16k • written 5.8 years ago by taijc06 • 0

score 0 · Answer 1 · 2018-07-05

0

Entering edit mode

5.8 years ago

NB ▴ 960

Hello, You can use bcftools to extract AF info. The command is something like this

bcftools query -f '%CHROM %POS %AF\n' input.vcf> ouput.vcf

You can read the manual for more info, incase AF tag is not present in your vcf INFO

ADD COMMENT • link 5.8 years ago by NB ▴ 960

0

Entering edit mode

OP wants the information that is contained into the VEP INFO/CSQ field, not the INFO/AF

ADD REPLY • link 5.8 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Sorry misunderstood the question, ignore my answer... not sure then... maybe generate the VEP output in a tab format to avoid the clustering and then extract the AF column.

ADD REPLY • link 5.8 years ago by NB ▴ 960

score 0 · Answer 2 · 2018-07-05

using bioalcidaejdk : http://lindenb.github.io/jvarkit/BioAlcidaeJdk.html

the code contains a variable 'tools' which itself contains a parser for the VEP output. There are some duplicated lines if there is more than one transcript per variant.

 java -jar dist/bioalcidaejdk.jar -e 'println("CHROM\tPOS\tREF\tAF");stream().forEach(V->tools.getVepPredictions(V).stream().forEach(P->{println(V.getContig()+"\t"+V.getStart()+"\t"+V.getReference().getDisplayString()+"\t"+P.getByCol("AF"));}));'

CHROM   POS REF AF
21  26960070    G   0.0014
21  26960070    G   0.0014
21  26960070    G   0.0014
21  26965148    G   0.7324
21  26965148    G   0.7324
21  26965148    G   0.7324
21  26965172    T   0.0106
21  26965172    T   0.0106
21  26965172    T   0.0106
21  26965205    T   0.7324
21  26965205    T   0.7324
21  26965205    T   0.7324
21  26976144    A   0.0004
21  26976144    A   0.0004
21  26976144    A   0.0004
(...)