Create Vcf File From A Multiple Sequence Alignments
2
6
Entering edit mode
11.8 years ago
Whetting ★ 1.6k

Dear Biostars,
I have a question concerning the generation of vcf (variant calling format) creation.
Does anyone know of a tool that would allow me to turn a multiple sequence alignment (containing reference and several variants) into a vcf file?
thanks!

EDIT:
I have a multiple sequence alignment of a several cloned papillomaviruses. We know that the sequence of each individual genome are correct. I.e. all variations between the reference and these additional sequences represent naturally occurring SNPs (and not sequencing errors). I would like to extract the SNPs (and indels) from this alignment and create a vcf file. I hope this clarifies the problem! thanks again

alignment vcf variant snp • 16k views
ADD COMMENT
1
Entering edit mode
ADD REPLY
1
Entering edit mode

This was asked 2.6 years ago. :p

ADD REPLY
0
Entering edit mode

search the website for "SNP calling"

ADD REPLY
0
Entering edit mode

SNP calling is a little bit different from what I am looking for. Calling implies a certain threshold before something is considered a SNP and returns a level of confidence for each identified SNP. The sequences i am using are confirmed variants, i.e. I know that each variation is real. I would like to "simply" create a vcf file containing all differences between the files

ADD REPLY
0
Entering edit mode

Did you figure out a tool that does this? Also do you mean that any multiple sequence alignments using assembled sequences (assuming the assembly is correct) do not have to go through a "variant calling" approach? What about alignment errors?

ADD REPLY
0
Entering edit mode

What format is your data in? We need more information to understand what you are trying to do. What and Why = Best answer.

ADD REPLY
3
Entering edit mode
2.8 years ago
Patrick ▴ 60

I know this is a very old question, but since I got this as first hit in google today, I will add this answer for anyone who also might need this: There is a pretty nice tool now that does this called "snp-sites"

ADD COMMENT
2
Entering edit mode
11.8 years ago

There isn't a program I am aware of that does what you want. However, here are the steps I would take:

  1. Inport your MSA (multiple sequence alignment) into a program that can output variant sites only. Paup* can do this.
  2. Output the matrix and map the gene position to the genomic position.
  3. Write a script that will convert these data to VCF format.
ADD COMMENT

Login before adding your answer.

Traffic: 1768 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6