How to get VCF file into a data matrix form for machine-learning? (new to vcf files)
0
0
Entering edit mode
7.0 years ago
jespinoz ▴ 20

Right now I am running HISAT2 on the Homo sapiens hg38 SNP db from ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat2/data/grch38_snp.tar.gz which will produce 88 individual *.sam files (I have 88 samples) that I will then use to create vcf files.

Anyways, I want to get these vcf files into a form that I can use for some of my downstream pipelines. My question, is how can I get these vcf files into a (n= samples, m= SNPs) dimensional data matrix (preferably in Python or vcftools but open to others or writing my own method)? I have seen the term genotyping matrix in my Google searches, is this what I am trying to create? Apologies if this question is naive. I planned to create my own using pandas in Python but did not want to recreate the wheel if one already exists.

I'm using Python 3.6.1 on OSX.

vcf machine-learning genotype snps • 3.0k views
ADD COMMENT
0
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 2614 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6