How to merge vcf files
1
0
Entering edit mode
5.0 years ago

I'm a beginner in dealing with SNP data. I want to merge 3 vcf files into 1 vcf files. I used the following code:

/home/LXH/biosoft/bcftools/bcftools-1.9/bcftools merge 
/home/LXH/work/maize_RIL/results/4.SNP_VarDetect/319/319.filted.SNP.vcf.gz 
/home/LXH/work/maize_RIL/results/4.SNP_VarDetect/478/478.filted.SNP.vcf.gz 
/home/LXH/work/maize_RIL/365RIL/03vcf314/Zea_mays.314.vcf.gz > merge_all_vcf

but it didn't work, and I got the following information

[W::bcf_hdr_merge] Trying to combine "AC" tag definitions of different lengths

because there was a little difference in the vcf files. The one kind of vcf file is look like this: in the AC(allele count) lines, Number=.

##INFO=<ID=SF,Number=.,Type=String,Description="Source File (index to so
##INFO=<ID=AC,Number=.,Type=Integer,Description="Allele count in genotyp
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT
562 scaffold_28     233     .       C       T       999.00  .       AC1=315;

The another vcf file is look like this : in the AC lines ,Number=A

 ##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotyp
 ##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles
 ##INFO=<ID=DP4,Number=4,Type=Integer,Description="Number of high-quality
 ##INFO=<ID=MQ,Number=1,Type=Integer,Description="Average mapping quality

So I changed the "Number=. " into "Number=A" , then the 3 vcf files could combine together.However,when I used the PLINK to transform the merged vcf files to plink.ped files, I found the first two samples (resequence data) were zero in the 7th column to the end in the ped files ,all the rest samples were GBS data.

V1          V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20
319         319  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0
478         478  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0
 3           3  0  0  0  0  T  T  G   A   G   A   C   C   C   G   C   C   G   G
 5           5  0  0  0  0  T  T  G   G   G   G   C   C   C   G   C   C   G   G
 6           6  0  0  0  0  T  T  G   G   G   G   0   0   0   0   C   C   G   G
 7           7  0  0  0  0  T  T  G   G   G   G   T   T   C   C   C   C   G   G
 8           8  0  0  0  0  T  T  G   A   G   A   C   T   C   G   C   C   G   G
 9           9  0  0  0  0  T  T  G   G   G   G   C   C   C   G   C   C   G   G
 10          10  0  0  0  0  T  T  G   G   G   G   T   T   C   C   C   C   G   A

So there is my question, are this situation happened was duo to the erro occured in the merge step between the vcf files? Second, what's the difference between the "Number=." and "Number=A",can I changed it forcely? Hope somebody can help me to figure it out ,cause it really troubled me a lot.Thank you !

SNP bcftools vcf • 3.4k views
ADD COMMENT
0
Entering edit mode

what are the version (1st line in header) of those vcf files ?

ADD REPLY
0
Entering edit mode
##fileformat=VCFv4.2
ADD REPLY
0
Entering edit mode

Can you recall all the samples together in a multisample VCF file ? I also always had technical problems attempting to merge VCFs with many different tools.

ADD REPLY
0
Entering edit mode

Can you possibly try some things:

  1. gzip and tab-index your files - this sometimes fixes basic errors in the VCF headers. Produces a vcf.gz with an associated tbi file
  2. manually fill your vcf.gz files with tags using BCFtools: A: How to use bcftools to calculate AF INFO field from AC and AN in VCF?

Then, try the merge again.

ADD REPLY
0
Entering edit mode

[W::bcf_hdr_merge] Trying to combine "AC" tag definitions of different lengths

I believe that's just a warning, not necessarily a problem.

ADD REPLY

Login before adding your answer.

Traffic: 1955 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6