Forum:TCGA: How to tell if the data is exome or WGS?
3.4 years ago
Les Ander • 110
United States

I am trying to obtain variants identified from whole genome sequencing (not exome sequencing) for various tumors sequenced by the TCGA consortium. I looked here but there does not appear to be a clear way to do this. https://tcga-data.nci.nih.gov/tcga/dataAccessMatrix.htm?mode=ApplyFilter

If you are using TCGA MAF (or from broad firehouse) files as your variant source, look for column Sequence_Source , if its exome seq you should find value 'WXS' ; if its genome seq, it will be WGS.

MAF specification here+Specification).

@poisonAlien Awesome, thanks!

As far as I can see, this column is not filled out in files from the harmonized portal. Does anybody has any idea why this is the case? And how I can find out about whether the variants are from WXS or WGS?

16 months ago
Ying W ♦ 3.9k
South San Francisco, CA

You can try to use cgquery to identify the samples you are interested in. The library_strategy field will tell you if its WGS or WXS. You might need to specify your key to use this tool though (its the same tool that you would use to download controlled data).

Thank you so much. It is good to know I can get this from UCSC.

However, is there a way to simply get this from TCAG (https://tcga-data.nci.nih.gov/tcga/dataAccessMatrix.htm) or broad firehose (http://gdac.broadinstitute.org/)?

Seems like this information should be present somewhere in TCAG or Broad Firehose.