Merging and converting multiple vcf files into a SNP array
0
0
Entering edit mode
5.8 years ago
Arko ▴ 30

I have 50 .VCF files each corresponding to a patient sample and what I want to do is to merge all these files together, extract based on chromosome position / SNP ID for the Genotype information and then convert it into a 012 matrix in the most time efficient and effective way possible. VCF tools and BCF tools are capable of doing so but I'm trying to automate this so I'm trying to script it in Python or R possibly.

I wouldn't want duplicated SNPs over different samples (files) either, so the idea is to get a SNP array with column names as sample IDs extracted from file names and the row names as chromosome positions /SNP IDs.

VCF Python R BCF • 3.0k views
ADD COMMENT
1
Entering edit mode

What you want just sounds like a multisample VCF file without the metadata headers. Why not just call the necessary vcftools command from within Python or R?

ADD REPLY
0
Entering edit mode

A "SNP array" is usually an oligonucleotide microarray for calling millions of SNPs. Probably not the same as what you have in mind, but confusing nonetheless.

ADD REPLY
0
Entering edit mode

All things considered, what would be the fastest way to merge GVCF files and VCF files efficiently? BCF tools is a faster alternative when compared to VCF tools but doesn't work with GVCF files.

ADD REPLY
1
Entering edit mode

Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your reaction but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.

ADD REPLY
0
Entering edit mode

combinegvcfs walker from gatk (https://software.broadinstitute.org/gatk/documentation/tooldocs/3.8-0/org_broadinstitute_gatk_tools_walkers_variantutils_CombineGVCFs.php) allows combining gvcfs

ps: please move this post to comment to OP or make it a new post.

ADD REPLY
0
Entering edit mode

GATK doesn't allow merging of VCF and gVCF files unfortunately. My aim is to obtain a single VCF file from the entire set,

ADD REPLY
0
Entering edit mode

did you try bcftools merge with -g option ?

ADD REPLY
0
Entering edit mode

Tried it, but BCF tools on merge considers the NON - REF as a literal allele call instead of ignoring it and a NON-REF contributes to the genotype call.

ADD REPLY

Login before adding your answer.

Traffic: 2169 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6