How to convert .txt to .vcf?
0
0
Entering edit mode
2.9 years ago
Julia_W • 0

I have a txt file showing below,but the txt file is not correctly formatted to the .vcf standards.Is there any way to convert it to .vcf efficiently except for modifying manually?

enter image description here

vcf txt • 3.1k views
ADD COMMENT
2
Entering edit mode

I'm not sure - your file lacks so much information required for vcfs . Starting with chromosomal coordinates. Did you look at the vcf specs? You may have a file that contains chromosomal coordinates for your markers and this file might be a better start.

ADD REPLY
0
Entering edit mode

What if I have this genotyping txt file and the other txt file that contains chromosomal coordinates and the physical position of markers as showing below?Is there any possible to convert these information to vcf file by tools or something? By the way,QUAL,FILTER and INFO in vcf format are ignored.

enter image description here enter image description here

ADD REPLY
1
Entering edit mode

by tools[?]

Based on my incomplete knowledge and the limited information you provide I fear I'm no help - though both look a bit like some roll-your-own format to me and hence tools might be a bit out of question.

or something?

That's the good ol' bioinformatics way. Connect a roll-your-own format using a roll-your-own solution to a standard format. I recommend Python Dictionaries and Biopython as a versatile solution, I could see this would be possible via join tables using R's dplyr.

Quite frankly I don't even understand how the duplicated marker column header is supposed to connect to your coordinates file. Is Marker1 with value 20 the same coordinate as Marker1 value 24? Provided, it's just some counts. you'll end up with file looking much more like the second, just pivoted by markers. All the information will have to end up in the INFO field which then is ignored. And finally you lack the nucleotides at the coordinates. There's just too many open questions...

ADD REPLY
1
Entering edit mode

I'm guessing that the marker column is duplicated because it's trying to represent a diploid genome. But yeah, I don't think vcf format can handle alleles abstracted to numbers. It expects exact nucleotide sequences.

ADD REPLY
0
Entering edit mode

Thanks for your recommendation.I will try it.The duplicated marker column,as swbarnes2 said,represented a diploid genotyping.You can think of these values as ATCG.INFO field,as far as I know,could be the missing value specified with a dot(".").And FORMAT field will be GT.

ADD REPLY

Login before adding your answer.

Traffic: 1943 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6