I have questions about observations about COSMIC Mutation ID in GRCh37 version of COSMIC v72.
I wonder if there is a reliable way to validate COSMIC Mutation ID by using downloadable data at http://grch37-cancer.sanger.ac.uk/cosmic/download
I guess it could be an option to make a request to COSMIC website to check but I would like to avoid if possible.
COSMIC Mutation Ids in the downloadable data are not always searchable in the COSMIC website and seems inconsistent.
In looking at CosmicCompleteExport.tsv.gz and VCF/CosmicCodingMuts.vcf.gz, I am not sure how I could understand the followings:
- Some COSM ids in
VCF/CosmicCodingMuts.vcf.gz
are not found inCosmicCompleteExport.tsv.gz
Some COSM ids found in both
VCF/CosmicCodingMuts.vcf.gz
andCosmicCompleteExport.tsv.gz
are not found in website.- Example: COSM330384 is found in both files but not found in COSMIC website: http://grch37-cancer.sanger.ac.uk/cosmic/mutation/overview?id=330384
$ zcat cosmic/grch37/cosmic/v72/CosmicMutantExport.tsv.gz | grep -P "COSM330384\t" SLC4A11_ENST00000380059 ENST00000380059 2757 SCC-9 2296303 2161906 upper_aerodigestive_tract head_neck carcinoma squamous_cell_carcinoma y COSM330384 c.77C>G p.P26R Substitution - Missense 37 20:3218634-3218634 - y PASSENGER/OTHER Reported in another cancer sample as somatic 25275298 cell-line NS 25 # ...(many records more) $ zcat cosmic/grch37/cosmic/v72/VCF/CosmicCodingMuts.vcf.gz | grep -P "COSM330384\t" 20 3218634 COSM330384 G C . . GENE=SLC4A11_ENST00000380059;STRAND=-;SNP;GENE=SLC4A11_ENST00000380059;STRAND=-;CDS=c.77C>G;AA=p.P26R;CNT=10
- Example: COSM330384 is found in both files but not found in COSMIC website: http://grch37-cancer.sanger.ac.uk/cosmic/mutation/overview?id=330384
Some variants have multiple IDs assigned:
- Example:
$ zcat cosmic/grch37/cosmic/v72/VCF/CosmicCodingMuts.vcf.gz | grep -P "108175462\t" 11 108175462 COSM3736031 G A . . GENE=ATM_ENST00000278616;STRAND=+;SNP;GENE=ATM_ENST00000278616;STRAND=+;CDS=c.5557G>A;AA=p.D1853N;CNT=2 11 108175462 COSM41596 G A . . GENE=ATM;STRAND=+;SNP;GENE=ATM;STRAND=+;CDS=c.5557G>A;AA=p.D1853N;CNT=12
- Example:
I would appreciate if you would give any advice.
Thank you, Julio. It did help me a lot. I did a little more investigation. It sometimes assigns different COSM Ids to different samples for a variant on same gene model.
For example:
CosmicCompleteExport
VCF