Hi,
I got lost with the mutations from TCGA made available from cbioportal (http://www.cbioportal.org/) and from the gdc portal (http://gdc-portal.nci.nih.gov/), and I've the feeling I'm missing something. My problem is that I found different number of mutations, although the source should be the same.
An example: if I look for TP53 mutations in ovarian cancer.
- from cbioportal, I select Ovarian Serous Cystadenocarcinoma (TCGA, Nature 2011), mutations only and TP53: I obtain ~300 mutations ( I downloaded the results and checked that this is indeed mutations in ~300 samples).
- from the GDC portal, I download the MAF file for the same ovarian cancer (TCGA.OV.mutect.9579c7c5-e170-4674-97ab-5dbfe73f78d3.somatic.maf.gz, I also looked at other tools than mutect). I filter it for TP53 and obtain ~70 mutations.
I imagine that the processing of the mutations in cbioportal is different, but I didn't found a lot of documentation. Has anyone some clue about it?
thanks,
Arnaud
Thanks, I found the post you mention Anyone knows mutation pipeline for cbioportal? . My question is therefore some king of duplicate. I had also seen the doc in the cbioportal's faq, but unfortunately it doesn't say a lot about how the data is processed.
My guess is that cbioportal shows all mutations identified, without applying a particular tool for extracting driver mutations only.