Inaccurate/Undesired results from bcftools version 1.1 convert --tsv2vcf command
2
1
Entering edit mode
9.3 years ago

Hey,

I am working with a tab separated file of SNPs (mummer output) and want to convert it into a variant call format (VCF). I am currently using bcftools version 1.1 to execute this with the subcommand convert --tsv2vcf. Upon execution, there are no errors and a header of a VCF file and statistics about the conversion are displayed correctly. I am not getting any content in the VCF output though; ideally the content should be displayed for every line provided in the tab separated file. The content or "rows" are being skipping (output below). What am I doing wrong and how can I fix this to included each and every line in the initial tsv file? There is no indication as to why the rows are skipped.

Below is the command I executed, the output upon exection, and a portion of the initial text file. Any help would be appreciated.

Thanks, Taylor

Input File (TSV):

C       4875    scaffold5-3     .
C       12221   scaffold5-3     .
G       17413   scaffold5-3     .
C       17422   scaffold5-3     .

Command Used:

bcftools convert -c AA,POS,CHROM,ID  -f ../OAntigen_NAg_3528-08.fasta --tsv2vcf tempFile.txt  -O v -s OAntigen_3566-08_v2.fasta

Output Example:

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##contig=<ID=3528-08_OAntigen_prev_NODE_11_&_NODE_49_Jul_18,length=103905>
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  OAntigen_3566-08_v2.fasta
Rows total:     377
Rows skipped:   377
Missing GTs:    0
Hom RR:         0
Het RA:         83
Hom AA:         0
Het AA:         294
SNP VCF bcftools TSV • 4.5k views
ADD COMMENT
1
Entering edit mode

Hey Taylor...were you able to sort this out?..i'm facing the same problem for conversion of 23andMe files.

ADD REPLY
0
Entering edit mode
9.3 years ago
Lee Katz ★ 3.1k

I'm hoping that there is a bcftoolsanswer from someone but I made a script that might be helpful to anyone who might need to do this too.

https://github.com/lskatz/lskScripts/blob/master/mummerToVcf.pl

ADD COMMENT
0
Entering edit mode
7.6 years ago
liangjiao.xue ▴ 100

This is one late response. I think it is necessary because I spent hours to resolve this problem.

Originally, I thought this is a very easy case to convert from MUMmer/snps to VCF. However, it not that easy to get the correct solution.

Some traps:

  1. You need to check the reference sequence to rebuild insertion and deletion. Instead of reading original reference fasta file, I used show-snps -x 1, so that the surrounding nucleotides are also reported.
  2. For the insertions, if the query sequences are reversely mapped to the references, the orders of nucleotides in query sequence are reversely reported. So, they needed to be concatenated in reverse order.
  3. The coordinates of insertion and deletions. For insertions, the coordinates in MUMmer/snps are the coordinates of nucleotides before insertions. They need to be kept as the same in VCF files. For deletions, the coordinates in MUMmer/snps are of the nucleotides that are deleted. The coordinates in VCF should be : first_position_of_deletion_block - 1.

Here is my python code to fix the problems: https://github.com/liangjiaoxue/PythonNGSTools/blob/master/MUMmerSNPs2VCF.py

ADD COMMENT

Login before adding your answer.

Traffic: 2744 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6