TCGA SNP arrays
0
0
Entering edit mode
8.4 years ago

I am interested on using the TCGA SNP arrays data. What I would like to get is a list of the snps per sample and their position, chromosome, reference and alternate alleles. This would be something like:

sample_id - chromosome - position - reference - alternate

Looking at the TCGA data portal I have found a series of files called genotype.dat (there's one per sample) that contain the following information:

Composite_element_ref  chromosome  Physical position  Genotype
SNP_A-8575115          2           533321             AB

*Fake data

I have assumed that the first column is some kind of id, the second is the chromosome and the third is the position. However I am not sure about the meaning of the forth column.

The possible options that you can find on it are (AA, AB, BB or NC). Does this mean homozygote, heterozygote, not computed? How could I map this SNPS to the actual nucleotides that are being changed (for example C -> T)?

Thanks a lot in advance,

Joan

TCGA SNP • 2.1k views
ADD COMMENT

Login before adding your answer.

Traffic: 1418 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6