Biostar Beta. Not for public use.
Somatic allele frequency from TCGA in non-coding DNA
0
Entering edit mode
13 months ago
sacha ♦ 1.7k
France

I am looking for data which contains somatic allele frequency obtains from many tumor.

For exemple, something like:
chromosom pos ref alt AF

using TCGA website, it seems they only provide data per sample. Any Idea where I can find this data ?

EDIT

I want to get data for non-coding dna, from any location in the genom

ADD COMMENTlink
3
Entering edit mode
18 months ago
Amitm ♦ 1.6k
UK

hi,

Do you mean what is the mut. freq. for a gene X across all samples studied in Cancer X? This is available in cBioPortal. Just select your interest of Cancer, put in your gene name or alternatively click the Summary tab and get mut. freq. for top mutated genes.

If you are looking for mutated allele freq. i.e. in a particular sample how many reads supported the Ref. and Alt. allele., this too is available from cBioPortal. When you input your gene X and get a multi-tabbed page, go to the "Mutation" tab and there choose the appropriate col. from the "Show/ Hide Col" menu.

Here is an e.g. gene in a cancer type -

There are APIs available to access this data programmatically (though I have not used them). But if your query size is small, the website is good enough.

ADD COMMENTlink
0
Entering edit mode

Thanks !
It seems it's exactly what I need !

ADD REPLYlink
0
Entering edit mode

And is there a way to download all data for the complete human genom ?

Because it seems, I can download data only for defined genes. What about non coding dna ?

ADD REPLYlink
0
Entering edit mode

Ok, cBioPortal is greate! But I want the same data for non-coding dna! That's mean, I give a location and I get the AF in the same way that for coding genes.

ADD REPLYlink
0
Entering edit mode

I have posted answer in case you want (Somatic) mut. that are affecting non-protein coding region of the transcriptome.

ADD REPLYlink
0
Entering edit mode
15 months ago
Washington University in St. Louis, MO

The MAF files from TCGA only contain coding mutations. If you're looking for the rest of the genome, you'll need to get the VCFs, which are in most cases, protected data (for which you'll have to apply for access). That's because they're less stringently filtered and may contain germline mutations, bringing patient privacy in as a concern.

Once you get access, you'll find lots of non-coding mutations from the wingspans of the exomes, as well as some WGS cases. This is not universally true, but for AML, you can find all validated somatic mutations (regardless of coding status) in the supplementary tables from the publication. That's available here: https://tcga-data.nci.nih.gov/docs/publications/laml_2013/

ADD COMMENTlink
0
Entering edit mode

Thanks,

I have already a authorized key. Let me look ! Thanks

ADD REPLYlink
0
Entering edit mode
18 months ago
Amitm ♦ 1.6k
UK

hi Sacha,

You can download all Somatic mutations from the TCGA portal as pointed out by Chris Miller.

I would want to contradict what Chris said though. The TCGA MAF, as far as I have understood, contain all Somatic mut. And that would include any somatic mut. which were non-coding as well. What you wont get without a licence are the Germline calls.

As an e.g. here is a screenshot of the types of mut. present in the MAF file for melanoma (SKCM) samples -

So, ALL somatic, including non-coding as well. Though I have not worked with the AML cohort, I presume that this (coding as well as non-coding somatic mut. present) must be the situation for most of the MAFs available from TCGA, if not all.

ADD COMMENTlink
0
Entering edit mode

hi Sacha and Chris,

I think I maybe mistaken. Sacha, if you are looking for mut. in non-transcribed regions (non-coding DNA) in the TCGA data, then you will have to look into WGS data from TCGA. Most samples though have their exome sequenced and many those who have WGS are actually low-pass seq. for calling Copy number. There could be samples though in TCGA which have been WGS in depth enough to call Somatic mut. Sorry, I am not aware of any such samples in TCGA apart from what Chris has suggested above in the Supple. data.

ADD REPLYlink
0
Entering edit mode

In your screenshot, it seems you only have coding gene region ( exon + intron). I think those data comes from exome sequencing and not from whole genome sequencing. Or maybe I am wrong, so please attach a maf file wich contains non-coding DNA ( like lncRNA).

ADD REPLYlink
1
Entering edit mode

hi,

You are right. Most data, as I already mentioned are exome sequencing and the WGS are mostly low pass. That said, my experience is with SKCM samples so I would not be sure of the status of other cancer cohorts hosted on TCGA. To clarify again, the MAF that you get (from WES) would contain any Somatic mut. that was within the coordinates of the target capture method used for making the WES samples' seq. lib. prep.

SKCM, at present on TCGA, doesn't have mut. calls from WGS. Only WES.

ADD REPLYlink
1
Entering edit mode

That may be true some places, but I promise that for many cancer types, only protein-coding mutations were reported in the MAF files, and those non-coding mutations that fell within the wingspan of exome probes were excluded from reporting in the MAF files. The VCFs should provide a more comprehensive list, but are often kind of a mess, due to the strange requirements placed upon the centers by the consortium. (reporting all calls, even those filtered out, using separate columns for multiple callers, etc)

ADD REPLYlink
0
Entering edit mode

Thanks Chris for the insight. I have looked into only SKCM MAF and it had everything somatic (at least I hope so as I see silent & RNA mut. as well).

Its a pity though that there are licensing requirements and over that there are strict rules that need to be complied for the server that would hold licensed data.

ADD REPLYlink
1
Entering edit mode

Silent and RNA mutations can both be be considered coding, depending on your precise definition, which is why they're included. It's not licensing that's the issue, it's genetic privacy. Given enough of the germline mutations that slip through the somatic filters, people could be identifiable, and that raises all sorts of potential issues. I'm generally a proponent of wide sharing of genetic data, but these people (and their families) have not consented to having their germline mutations shared, in most cases.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1