Difference between a VCF file and a "genotype matrix" ?
1
1
Entering edit mode
8.8 years ago
stevenlang123 ▴ 210

I'm using NGS data to run a program that asks for a "genotype matrix" of samples and SNPs. Is this just the same as a VCF file?

NGS sequencing SNP • 5.3k views
ADD COMMENT
5
Entering edit mode
8.8 years ago

Usually it is a representation of the sample GT calls in the VCF file that represents alleles at that position (i.e. 0: homozygous ref, 1: heterozygous alt, 2: homozygous alt)

snp                       sample1     sample2     sample3
1:2348932A>C                    0           1           2

VCF is the right starting point. In R:

library("VariantAnnotation")
fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation")
vcf <- readVcf(fl, "hg19")
mat <- genotypeToSnpMatrix(vcf)
t(as(mat$genotype, "character"))
ADD COMMENT
0
Entering edit mode

I see. Do you know of any tools that can create such a file from a set of .BAMs? The exact specification for the file I need to create is the following: 1st column: gene name; 2nd column: snp name; 3rd-end columns: A matrix of genotypes for each subject (class: data.frame). The order of 3rd-end columns should match id. Coded as 0, 1, 2 and no missing.

Thanks in advance

ADD REPLY
0
Entering edit mode

Awesome! Thank you very much for your help

ADD REPLY

Login before adding your answer.

Traffic: 2483 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6