List of genes sequenced in genome sequencing data from TCGA
1
2
Entering edit mode
9.4 years ago

Hi friends,

I am having some difficulty in understanding the patient mutation data provided in TCGA. The Mutation annotation files provide only the list of genes which were found to have mutations, in each patient. I could not understand how many genes were sequenced per patient. Can we say some thing like "for 500 patients 20000 genes were sequenced and only 400 genes were found to have mutation; thus rest genes were normal/wild type"?

Simply I want to know how many genes were sequenced for each patient?

cancer tcga mutation • 4.1k views
ADD COMMENT
0
Entering edit mode

You have to provide information like: Which cancer type are you looking at? Which mutation data are you talking about? Be specific.

ADD REPLY
2
Entering edit mode
9.4 years ago

Several exome capture kits were used by the three TCGA Genome Sequencing Centers. The Methods section in each of the tissue-specific marker papers will list the specific exome capture kits used. Nimblegen SeqCap Ez and Agilent Sureselect were the two most commonly used kits, but be aware of different versions of each kit like SeqCap Ez v2 and SeqCap Ez v3. After finding the kits used in a TCGA project, a little googling will point you to the design files (usually BED format) that list all the regions targeted by their hybridization probes... plus gene IDs where available that you can deduplicate and count. For example, SeqCap Ez v2 BED files are downloadable here.

A caveat is that sequencing coverage is not uniform, despite guarantees by these capture kit manufacturers. To be certain that a gene is wildtype, you have to ensure that it was "sufficiently covered for variant calling". I explain this in more detail over here, and provide BED files per TCGA sample that had sufficient coverage for variant calling. These BED files can be intersected with your annotation space (a GTF or BED) to understand how many genes were sufficiently covered to be confidently called mutated/wildtype... down to a per-sample level.

ADD COMMENT
0
Entering edit mode

Do you know if there is a file that indicates what capture kit was used per patient for the TCGA Breast data?

Thank you!

ADD REPLY

Login before adding your answer.

Traffic: 2701 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6