Biostar Beta. Not for public use.
What does TCGA uses a a reference for making snp annotation?
0
Entering edit mode
16 months ago
Cuba/Havana/Faculty of Biology

Hi, as it may seem for my question I'm a newbie at dealing with all the too many databases for sequences. The situation: I have some .maf with mutations of a given cancer, let's say breast cancer, and a given gene, TP53. The .maf clearly says where the mutations start and end. (I'm just interested in point mutations)


The point is that I want to construct a mutated sequence from this mutation data, using the reference sequence a a template, but there are so many different transcripts, so basically I just want to know which one does TCGA uses for references. Is it the whole gen? Or just the exons?

Thanks in advance

ADD COMMENTlink
0
Entering edit mode

If you want to see the mutation effect of protein, you have to choose exon regions (transcript) from direct splicing or transcripts derived from alternative splicing, try to see if there are mutations. In case branch points, intron exon donor acceptor sites also crucial, since the mutations in these region could affect the splicing. Read this: A: How to analysis mutations effects bioinformatically? and this A: Allele frequency visualization

ADD REPLYlink
0
Entering edit mode

This does not answer the original question, pltbiotech_tkarthi

ADD REPLYlink
1
Entering edit mode
13 months ago
i.sudbery 4.7k
Sheffield, UK

The TCGA have their own, very slightly custom genome reference, which is basically consists of hg38 analysis set plus some decoy and virus sequences.

You can read about it and download it here: GDC Reference Files

ADD COMMENTlink
0
Entering edit mode

Just to add, this file is the reference sequence for the whole genome, which , I believe is what TCGA uses. Warning: This is a big file.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3.1