Question: Mutations Not Recognized in MuSiC
3
Entering edit mode

Hi,

I am trying to use MuSiC to analyse mutation rates in novel, non-coding genes. I am able to successfully run the relevant commands in MuSiC and the coverage statistics look correct, but the results show no mutations in any genes (which I know isn't true). My guess is that there is probably some formatting issue with the .maf file containing somatic mutations, which is causing the output of the "bmr calc-bmr" to be inaccurate.

Here are the first few lines of my .maf file

version 2.3

Hugo_Symbol Entrez_Gene_Id Center NCBI_Build Chromosome Start_Position End_Position Strand Variant_Classification Variant_Type Reference_Allele Tumor_Seq_Allele1 Tumor_Seq_Allele2 dbSNP_RS dbSNP_Val_Status Tumor_Sample_Barcode Matched_Norm_Sample_Barcode Match_Norm_Seq_Allele1 Match_Norm_Seq_Allele2 Tumor_Validation_Allele1 Tumor_Validation_Allele2 Match_Norm_Validation_Allele1 Match_Norm_Validation_Allele2 Verification_Status Validation_Status Mutation_Status Sequencing_Phase Sequence_Source Validation_Method Score BAM_File Sequencer Tumor_Sample_UUID Matched_Norm_Sample_UUID
Unknown 0 genome.wustl.edu GRCh37-lite 1 322115 322115 + Targeted_Region SNP G A G NA NA TCGA-E2-A15K TCGA-E2-A15K G G NA NA NA NA Unknown Unknown Somatic PhaseI WGS No NA NA Illumina f289e8b7-68db-48b9-8dcc-1349269eb54b c24945be-a051-4797-b7e6-09b32396f354
Unknown 0 genome.wustl.edu GRCh37-lite 1 328193 328193 + Targeted_Region SNP A A G NA NA TCGA-E2-A15K TCGA-E2-A15K A A NA NA NA NA Unknown Unknown Somatic PhaseI WGS No NA NA Illumina f289e8b7-68db-48b9-8dcc-1349269eb54b c24945be-a051-4797-b7e6-09b32396f354
Unknown 0 genome.wustl.edu GRCh37-lite 1 384901 384901 + Targeted_Region SNP G A G NA NA TCGA-E2-A15K TCGA-E2-A15K G G NA NA NA NA Unknown Unknown Somatic PhaseI WGS No NA NA Illumina f289e8b7-68db-48b9-8dcc-1349269eb54b c24945be-a051-4797-b7e6-09b32396f354
Unknown 0 genome.wustl.edu GRCh37-lite 1 390657 390657 + Targeted_Region SNP A A G NA NA TCGA-E2-A15K TCGA-E2-A15K A A NA NA NA NA Unknown Unknown Somatic PhaseI WGS No NA NA Illumina f289e8b7-68db-48b9-8dcc-1349269eb54b c24945be-a051-4797-b7e6-09b32396f354
Unknown 0 genome.wustl.edu GRCh37-lite 1 404577 404577 + Targeted_Region SNP G A G NA NA TCGA-E2-A15K TCGA-E2-A15K G G NA NA NA NA Unknown Unknown Somatic PhaseI WGS No NA NA Illumina f289e8b7-68db-48b9-8dcc-1349269eb54b c24945be-a051-4797-b7e6-09b32396f354

Here are the music commands that I am using:

 genome music bmr calc-covg --bam-list /path/to/bam.list --output-dir /path/to/output_folder --reference-sequence /path/to/GRCh37-lite.fa --roi-file /path/to/gene_coordinates.bed

 genome music bmr calc-bmr --bam-list /tcga/users/cdwarden/wgs/BRCA/MuSiC/bam.list --maf-file /path/to/somatic.maf --output-dir /path/to/output_folder --reference-sequence /path/to/GRCh37-lite.fa --roi-file /path/to/gene_coordinates.bed

 genome music smg --gene-mr-file /path/to/gene_mrs --output-file /path/to/smgs

I have also tried adding the transcript ID to the first mutation in the .maf file (so that I would expect to see one mutation in the "smgs_detailed" file), but that gene still is reported to have 0 mutations.

Can you please help me troubleshoot this issue?

Thanks,

Charles

Entering edit mode
0

I think its because Hugo_Symbols are Unknown in your maf file.

ADD REPLYlink 4.4 years ago
poisonAlien
♦ 2.8k
Entering edit mode
0

I changed the transcript ID for the first mutation to match the corresponding gene, and that gene was still reported to not have any mutations. Also, I used "Unknown" (instead of NA, etc.) because that is what I thought the .maf format required for such genes.

Is there something else that should be changed besides "Unknown"?

ADD REPLYlink 4.4 years ago
Charles Warden
6.8k
Entering edit mode
0

I have used this programme a while back, and what I understand is, the gene names in maf file must match the gene names in your roi file, which you use for calc-covg function. Also, it will skip all those silent variants in Variant_Classification column ; unless you mention not skip so. In your example, I see that most of the varaints have Variant_Classification set to Unknown, which might be the one reason.

ADD REPLYlink 4.4 years ago
poisonAlien
♦ 2.8k
Entering edit mode
0

This is correct. The Hugo_Symbol needs to be properly defined. These calls seem to be annotated incorrectly as Targeted_Region, which is something that MuSiC skips as intergenic. Considering that the MAF says WGS, these might be legitimately intergenic calls. Check in a genome browser.

ADD REPLYlink 4.4 years ago
Cyriac Kandoth
5.3k
Entering edit mode
0

Yes - I want to characterize mutation rates in ncRNAs (most of which will not be covered in exome designs, and many of which are novel).

What would you recommend for the Variant_Classification and Variant_Type, in this situation?

ADD REPLYlink 4.4 years ago
Charles Warden
6.8k
Entering edit mode
1

You can refer to the documentation here. When you run music bmr calc-bmr, enable the option --noskip-non-coding. You'll still need to annotate each variant with a symbol that it can match back to a region in your ROI file. MAF format is not as detailed in distinguishing between ncRNA types. Variant_Classification will always say RNA. But name the genes differently using annotators like VEP, and you should be fine. Have you tried the maf2maf tool?

ADD REPLYlink 4.4 years ago
Cyriac Kandoth
5.3k
Entering edit mode
0

Thank you very much !!

ADD REPLYlink 4.4 years ago
Chirag Nepal
♦ 2.2k
Entering edit mode
0

This is also something i wonder how to prioritize such intergenic/intronic SNVs.

ADD REPLYlink 4.4 years ago
Chirag Nepal
♦ 2.2k
4
Entering edit mode

Thanks to Cyriac, I found the solution is as follows:

1) Set Variant_Classification to RNA

2) Use the "--noskip-non-coding" option when running music bmr calc-bmr

ADD COMMENTlink 4.4 years ago Charles Warden 6.8k

Login before adding your answer.

Powered by the version 1.8