Set Genotype Of Specific Sample/Genotype Comb To Unknown In Multisample Vcf File
1
0
Entering edit mode
10.5 years ago
Maarten • 0

I am running QC on a exomeseq dataset with a family structure. I would like to remove the Mendelian inconsistencies that I found: even with a high genotype quality threshold this occurs more often than expected. I ran into many programs to extract these Mendelian inconsistencies , but I like to set it to zero. (well actually ./.) . Does someone have same pointers to go from here?

To make it more clear the file of genotypes/sample id looks like this:

ID2   SNP34
ID3   SNP43
vcf qualitycontrol • 2.9k views
ADD COMMENT
2
Entering edit mode
10.5 years ago

I quickly wrote a tool for this: https://github.com/lindenb/jvarkit/wiki/Biostar86363

$ cat reset.txt
20    14370    NA00001
20    1234567    NA00003
20    1110696    NA00002

$ curl "https://raw.github.com/jamescasbon/PyVCF/master/vcf/test/example-4.1.vcf" |\
  java -jar dist/biostar86363.jar -G reset.txt 

##fileformat=VCFv4.1
##FILTER=<ID=q10,Description="Quality below="" 10"="">
##FILTER=<ID=s50,Description="Less than="" 50%="" of="" samples="" have="" data"="">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth"="">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality"="">
##FORMAT=<ID=GR,Number=1,Type=Integer,Description="(1) =="" Genotype="" was="" reset="" by="" Biostar86363="" "="">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality"="">
##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele"="">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency"="">
##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership,="" build="" 129"="">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth"="">
##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership"="">
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of="" Samples="" With="" Data"="">
##contig=<ID=20,length=62435964,assembly=B36,md5=f126cdf8a6e0c7f379d618ff66beb2da,species="Homo sapiens",taxonomy="x">
##fileDate=20090805
##phasing=partial
##reference=file:///seq/references/1000GenomesPilot-NCBI36.fasta
##source=myImputationProgramV3.1
#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    NA00001    NA00002    NA00003
20    14370    rs6054257    G    A    29    PASS    AF=0.5;DB;DP=14;H2;NS=3    GT:DP:GQ:GR:HQ    .|.:1:48:1:51,51    1|0:8:48:0:51,51    1/1:5:43:0
20    17330    .    T    A    3    q10    AF=0.017;DP=11;NS=3    GT:GQ:DP:HQ    0|0:49:3:58,50    0|1:3:5:65,3    0/0:41:3
20    1110696    rs6040355    A    G,T    67    PASS    AA=T;AF=0.333,0.667;DB;DP=10;NS=2    GT:DP:GQ:GR:HQ    1|2:6:21:0:23,27    .|.:0:2:1:18,2    2/2:4:35:0
20    1230237    .    T    .    47    PASS    AA=T;DP=13;NS=3    GT:GQ:DP:HQ    0|0:54:7:56,60    0|0:48:4:51,51    0/0:61:2
20    1234567    microsat1    GTC    G,GTCT    50    PASS    AA=G;DP=9;NS=3    GT:DP:GQ:GR    0/1:4:35:0    0/2:2:17:0    ./.:3:40:1
ADD COMMENT
0
Entering edit mode

Awesome you made this. It is working indeed. The only thing I had to do is to change the access numbers to a position.(join to the rescue! )

ADD REPLY

Login before adding your answer.

Traffic: 2722 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6