Vcf with multiple samples collapse no losing genotypes
0
0
Entering edit mode
3.8 years ago
chariko ▴ 60

I have a vcf like that one:

##fileformat=VCFv4.0
##FILTER=<ID=PASS,Description="All filters passed">
##fileDate=20200612
##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw Depth">
##INFO=<ID=AF,Number=1,Type=Float,Description="Allele Frequency">
##INFO=<ID=SB,Number=1,Type=Integer,Description="Phred-scaled strand bias at this position">
##INFO=<ID=DP4,Number=4,Type=Integer,Description="Counts for ref-forward bases, ref-reverse, alt-forward and alt-reverse bases">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##contig=<ID=NC_045512.2>
 #CHROM POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  Sample1 Sample2 Sample3 Sample4
 NC_045512.2    71  .   C   T   494 PASS    AF=0.041451;SB=3;DP=3177;DP4=2604,474,77,14 GT  1   .   1   .
 NC_045512.2    71  .   C   T   494 PASS    AF=0.041451;SB=3;DP=3177;DP4=2604,474,77,14 GT  .   1   .   .

As you can see there are two variants in different rows being the same but present in different samples (first row sample1 and sample3; second row sample2).

I want to get the row collapsed but keeping the genotypes from both rows. Like this:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  SRR11607198.anno    Sample1 Sample2 Sample3 Sample4
    NC_045512.2 71  .   C   T   494 PASS    AF=0.041451;SB=3;DP=3177;DP4=2604,474,77,14 GT  1   1   1   .

I tried to run bcftools removing duplicates:

bcftools_normCommand=norm -d none -o merg_3nodup.vcf merg_3.vcf

But it does not work as it just deletes the second row (no keeping genotype from sample2).

I also tried to run bcftools collapsing using isec (at first it is indicated for using it with multiple vcf's files and it only allows you to run with one vcf if putting --targets option:

bcftools isec -c none --targets "NC_045512.2" merg_3.vcf.gz -o merg_3collapse.vcf

But it keeps the vcf exactly like the initial file.

Does anyone have a clue on how to proceed?

bcftools vcf collapse isec norm • 1.2k views
ADD COMMENT
0
Entering edit mode

Does anyone have a clue on how to proceed?

split your vcf per sample and then merge the 4 vcf ?

ADD REPLY
0
Entering edit mode

Thanks for your answer Pierre,

Unfortunately this vcf was originaly created by merging some other vcf files (thousands of vcfs in fact) and splitting again is not a desirable option.

Another suggestion?

ADD REPLY

Login before adding your answer.

Traffic: 1482 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6