Biostar Beta. Not for public use.
Filter vaiants having no AF information
0
Entering edit mode
17 months ago

I have > 1000 samples and I want to filter out variants based on minor allele frequency, My input dataset is a vcf file in this format:

CHROM   POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  88  108 139 159 265 350

1   55  .   C   T   40  PASS    DP=6720;EFF=intergenic_region(MODIFIER||||||||||1)  GT:GQ:DP    ./.:.:. 0|0:36:4    0|0:32:9    0|0:30:4    ./.:.:. ./.:.:.

1   56  .   T   A   40  PASS    DP=6785;EFF=intergenic_region(MODIFIER||||||||||1)  GT:GQ:DP    ./.:.:. ./.:.:. 0|0:32:9    0|0:30:4    ./.:.:. ./.:.:.

1   63  .   T   C   40  PASS    DP=7053;EFF=intergenic_region(MODIFIER||||||||||1)  GT:GQ:DP    ./.:.:. 0|0:40:5    0|0:32:9    0|0:38:5    ./.:.:. ./.:.:.

1   73  .   C   A   40  PASS    DP=8169;EFF=intergenic_region(MODIFIER||||||||||1)  GT:GQ:DP    ./.:.:. 0|0:40:5    0|0:40:9    0|0:38:6    ./.:.:. ./.:.:.

How can I keep snps with minor allele frequecny >= 0.05

ADD COMMENTlink
0
Entering edit mode

I am trying to compile vcffilterjdk but getting this error:

Task :vcffilterjdk FAILED Downloading http://central.maven.org/maven2/com/github/samtools/htsjdk/2.19.0/htsjdk-2.19.0.jar to /home/waqas/jvarkit/lib/com/github/samtools/htsjdk/2.19.0/htsjdk-2.19.0.jar

FAILURE: Build failed with an exception.

BUILD FAILED in 0s 1 actionable task: 1 executed

ADD REPLYlink
0
Entering edit mode
ADD REPLYlink
0
Entering edit mode

I have fixed the proxy settings and VcfFilterJdk took around 3 hours to complete the process, but it just updated the header, it didn't updated the INFO field, do I need to update the AN/AC fields first and then have to apply this script, as you can see from INFO (DP=6720;EFF=intergenic_region(MODIFIER||||||||||1)) filed that i don't have AN/AC tags in it? The currnet output of VcfFilterJdk is:

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##FILTER=<ID=q25,Description="Quality below 25">
##FILTER=<ID=q30,Description="Quality below 30">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=FT,Number=.,Type=String,Description="Genotype-level filter">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=EFF,Number=.,Type=String,Description="Predicted effects for this variant.Format: 'Effect ( Effect_Impact | Functional_Class | Codon_Change | Amino_Acid_Change| Amino_Acid_length | Gene_Name | Transcript_BioType | Gene_Coding | Transcript_ID | Exon_Rank  | Genotype_Number [ | ERRORS | WARNINGS ] )'">
##INFO=<ID=LOF,Number=.,Type=String,Description="Predicted loss of function effects for this variant. Format: 'Gene_Name | Gene_ID | Number_of_transcripts_in_gene | Percent_of_transcripts_affected'">
##INFO=<ID=MAF,Number=1,Type=Float,Description="Min Allele Frequency">
##INFO=<ID=NMD,Number=.,Type=String,Description="Predicted nonsense mediated decay effects for this variant. Format: 'Gene_Name | Gene_ID | Number_of_transcripts_in_gene | Percent_of_transcripts_affected'">
##vcffilterjdk.meta=compilation:20190507213457 githash:ca6efffb htsjdk:2.19.0 date:20190507225208 cmd:-e VariantContextBuilder vcb = new VariantContextBuilder(variant); float ac = variant.getAttributeAsInt( AN ,0); if(ac>0) { List<Float> af = variant.getAttributeAsIntList( AC ,0).stream().map(N->N/ac).collect(Collectors.toList());vcb.attribute( AF ,af);vcb.attribute( MAF ,af.stream().mapToDouble(X->X.floatValue()).min().orElse(-1.0) );} return vcb.make();
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  88  108 139
1   55  .   C   T   40  PASS    DP=6720;EFF=intergenic_region(MODIFIER||||||||||1)  GT:DP:GQ    ./. 0|0:4:36    0|0:9:32
1   56  .   T   A   40  PASS    DP=6785;EFF=intergenic_region(MODIFIER||||||||||1)  GT:DP:GQ    ./. ./. 0|0:9:32
1   63  .   T   C   40  PASS    DP=7053;EFF=intergenic_region(MODIFIER||||||||||1)  GT:DP:GQ    ./. 0|0:5:40    0|0:9:32
1   73  .   C   A   40  PASS    DP=8169;EFF=intergenic_region(MODIFIER||||||||||1)  GT:DP:GQ    ./. 0|0:5:40    0|0:9:40
ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3.1