VCF file issue in merging
1
0
Entering edit mode
2.9 years ago
SHN ▴ 40

Hello,

I started imputation with IMPUTEV2. I know it is an old version but I came this far and I would like to finish it. Now after converting the IMPUTE2 output to VCF files using SHAPEIT, I need to piece the chromosome chunks back together.

I am using bcftools in merging all the data : (my purpose is to merge all the VCF files and then import in plink file for QC and then making IBS matrix)

$BCFTOOLS merge --merge all vcf_chr1_chunk1.vcf.gz vcf_chr1_chunk2.vcf.gz vcf_chr1_chunk3.vcf.gz-O v > $results_merged_vcf'/Chromosome1.imputed.vcf

Though the program asks for the index file using

tabix -p vcf file.vcf.gz

I can not index the file as it needs to be sorted by chromosome. when I try to sort the program mention it can not parse through [--- 45000037].

In my VCF file (after converting data from IMPUTE2 output file to VCF file) I have the format below for the imputed variants.

Here is the format of my VCF file:

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 1610 2318 2421
--- 45000037 rs140328665:45000037:G:A G A . PASS . GT 0|0
--- 45002242 1:45002242:G:A G A . PASS . GT 0|0

So here are my questions:

  1. Should I write a code in converting all the '---' to chromosome number for each chunk?
  2. for the cases below:

    --- 45002242 1:45002242:G:A G A . PASS . GT 0|0

Should I manually format the second column to just the variant ID?

Should I loop over all the files and remove them, or there is a way to remove them from the beginning?

Is there any software to do this step for me for the sake of saving time?

Thank you for your help in advance

bcftools ImputeV2 VCF • 1.3k views
ADD COMMENT
0
Entering edit mode

How did you convert it from IMPUTE2 output to vcf? Seems weird that it would just discard the chromosome identifer. My suggestion would be to use a different conversion tool (plink / qctool) to convert to vcf which retains the chromosome identifier. This way you don't have to write any manual tools to do it for you.

ADD REPLY
0
Entering edit mode

I used shapeit to convert from IMPUTE2 format to VCF. There was no chromosome number from the beginning as these lines are imputed. This is the format in IMPUTE2 output format:

--- 1:35000209:G:C 35000209 G C 0 0 0

--- rs75886048:35000218:C:A 35000218 C A 1 0

--- 1:35000252:T:G 35000252 T G 0 0 0 0

and this is the output format in the VCF:

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT

--- 45000037 rs140328665:45000037:G:A G A . PASS . GT 0|0

--- 45002242 1:45002242:G:A G A . PASS . GT 0|0

The QCTOOLS doesn't read the '---' format and doesn't work. So you mean this VCF output is not the correct version output from shapeit2?

Thanks

ADD REPLY
0
Entering edit mode
23 months ago
yeyidi9341 • 0

Sometimes, managing multiple VCFs can be very difficult. Due to this, you should have preferred a Single VCF file with all your valuable data inside. The manual method is quite lengthy and requires technical knowledge So, so I suggest you use Merge VCF, which lets you combine multiple VCF files without any duplicates.

Visit at : https://www.wholeclear.com/merge/vcard/

ADD COMMENT

Login before adding your answer.

Traffic: 2226 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6