Biostar Beta. Not for public use.
How to merge vcf files
0
Entering edit mode
15 months ago

I'm a beginner in dealing with SNP data. I want to merge 3 vcf files into 1 vcf files. I used the following code:

/home/LXH/biosoft/bcftools/bcftools-1.9/bcftools merge 
/home/LXH/work/maize_RIL/results/4.SNP_VarDetect/319/319.filted.SNP.vcf.gz 
/home/LXH/work/maize_RIL/results/4.SNP_VarDetect/478/478.filted.SNP.vcf.gz 
/home/LXH/work/maize_RIL/365RIL/03vcf314/Zea_mays.314.vcf.gz > merge_all_vcf

but it didn't work, and I got the following information

[W::bcf_hdr_merge] Trying to combine "AC" tag definitions of different lengths

because there was a little difference in the vcf files. The one kind of vcf file is look like this: in the AC(allele count) lines, Number=.

##INFO=<ID=SF,Number=.,Type=String,Description="Source File (index to so
##INFO=<ID=AC,Number=.,Type=Integer,Description="Allele count in genotyp
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT
562 scaffold_28     233     .       C       T       999.00  .       AC1=315;

The another vcf file is look like this : in the AC lines ,Number=A

 ##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotyp
 ##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles
 ##INFO=<ID=DP4,Number=4,Type=Integer,Description="Number of high-quality
 ##INFO=<ID=MQ,Number=1,Type=Integer,Description="Average mapping quality

So I changed the "Number=. " into "Number=A" , then the 3 vcf files could combine together.However,when I used the PLINK to transform the merged vcf files to plink.ped files, I found the first two samples (resequence data) were zero in the 7th column to the end in the ped files ,all the rest samples were GBS data.

V1          V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20
319         319  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0
478         478  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0
 3           3  0  0  0  0  T  T  G   A   G   A   C   C   C   G   C   C   G   G
 5           5  0  0  0  0  T  T  G   G   G   G   C   C   C   G   C   C   G   G
 6           6  0  0  0  0  T  T  G   G   G   G   0   0   0   0   C   C   G   G
 7           7  0  0  0  0  T  T  G   G   G   G   T   T   C   C   C   C   G   G
 8           8  0  0  0  0  T  T  G   A   G   A   C   T   C   G   C   C   G   G
 9           9  0  0  0  0  T  T  G   G   G   G   C   C   C   G   C   C   G   G
 10          10  0  0  0  0  T  T  G   G   G   G   T   T   C   C   C   C   G   A

So there is my question, are this situation happened was duo to the erro occured in the merge step between the vcf files? Second, what's the difference between the "Number=." and "Number=A",can I changed it forcely? Hope somebody can help me to figure it out ,cause it really troubled me a lot.Thank you !

ADD COMMENTlink
0
Entering edit mode

what are the version (1st line in header) of those vcf files ?

ADD REPLYlink
0
Entering edit mode
##fileformat=VCFv4.2
ADD REPLYlink
0
Entering edit mode

Can you recall all the samples together in a multisample VCF file ? I also always had technical problems attempting to merge VCFs with many different tools.

ADD REPLYlink
0
Entering edit mode

Can you possibly try some things:

  1. gzip and tab-index your files - this sometimes fixes basic errors in the VCF headers. Produces a vcf.gz with an associated tbi file
  2. manually fill your vcf.gz files with tags using BCFtools: A: How to use bcftools to calculate AF INFO field from AC and AN in VCF?

Then, try the merge again.

ADD REPLYlink
0
Entering edit mode

[W::bcf_hdr_merge] Trying to combine "AC" tag definitions of different lengths

I believe that's just a warning, not necessarily a problem.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1