Biostar Beta. Not for public use.
Mutations Not Recognized in MuSiC
3
Entering edit mode
15 months ago
Duarte, CA

Hi,

I am trying to use MuSiC to analyse mutation rates in novel, non-coding genes. I am able to successfully run the relevant commands in MuSiC and the coverage statistics look correct, but the results show no mutations in any genes (which I know isn't true). My guess is that there is probably some formatting issue with the .maf file containing somatic mutations, which is causing the output of the "bmr calc-bmr" to be inaccurate.

Here are the first few lines of my .maf file

version 2.3

Hugo_Symbol Entrez_Gene_Id Center NCBI_Build Chromosome Start_Position End_Position Strand Variant_Classification Variant_Type Reference_Allele Tumor_Seq_Allele1 Tumor_Seq_Allele2 dbSNP_RS dbSNP_Val_Status Tumor_Sample_Barcode Matched_Norm_Sample_Barcode Match_Norm_Seq_Allele1 Match_Norm_Seq_Allele2 Tumor_Validation_Allele1 Tumor_Validation_Allele2 Match_Norm_Validation_Allele1 Match_Norm_Validation_Allele2 Verification_Status Validation_Status Mutation_Status Sequencing_Phase Sequence_Source Validation_Method Score BAM_File Sequencer Tumor_Sample_UUID Matched_Norm_Sample_UUID
Unknown 0 genome.wustl.edu GRCh37-lite 1 322115 322115 + Targeted_Region SNP G A G NA NA TCGA-E2-A15K TCGA-E2-A15K G G NA NA NA NA Unknown Unknown Somatic PhaseI WGS No NA NA Illumina f289e8b7-68db-48b9-8dcc-1349269eb54b c24945be-a051-4797-b7e6-09b32396f354
Unknown 0 genome.wustl.edu GRCh37-lite 1 328193 328193 + Targeted_Region SNP A A G NA NA TCGA-E2-A15K TCGA-E2-A15K A A NA NA NA NA Unknown Unknown Somatic PhaseI WGS No NA NA Illumina f289e8b7-68db-48b9-8dcc-1349269eb54b c24945be-a051-4797-b7e6-09b32396f354
Unknown 0 genome.wustl.edu GRCh37-lite 1 384901 384901 + Targeted_Region SNP G A G NA NA TCGA-E2-A15K TCGA-E2-A15K G G NA NA NA NA Unknown Unknown Somatic PhaseI WGS No NA NA Illumina f289e8b7-68db-48b9-8dcc-1349269eb54b c24945be-a051-4797-b7e6-09b32396f354
Unknown 0 genome.wustl.edu GRCh37-lite 1 390657 390657 + Targeted_Region SNP A A G NA NA TCGA-E2-A15K TCGA-E2-A15K A A NA NA NA NA Unknown Unknown Somatic PhaseI WGS No NA NA Illumina f289e8b7-68db-48b9-8dcc-1349269eb54b c24945be-a051-4797-b7e6-09b32396f354
Unknown 0 genome.wustl.edu GRCh37-lite 1 404577 404577 + Targeted_Region SNP G A G NA NA TCGA-E2-A15K TCGA-E2-A15K G G NA NA NA NA Unknown Unknown Somatic PhaseI WGS No NA NA Illumina f289e8b7-68db-48b9-8dcc-1349269eb54b c24945be-a051-4797-b7e6-09b32396f354

Here are the music commands that I am using:

 genome music bmr calc-covg --bam-list /path/to/bam.list --output-dir /path/to/output_folder --reference-sequence /path/to/GRCh37-lite.fa --roi-file /path/to/gene_coordinates.bed

 genome music bmr calc-bmr --bam-list /tcga/users/cdwarden/wgs/BRCA/MuSiC/bam.list --maf-file /path/to/somatic.maf --output-dir /path/to/output_folder --reference-sequence /path/to/GRCh37-lite.fa --roi-file /path/to/gene_coordinates.bed

 genome music smg --gene-mr-file /path/to/gene_mrs --output-file /path/to/smgs

I have also tried adding the transcript ID to the first mutation in the .maf file (so that I would expect to see one mutation in the "smgs_detailed" file), but that gene still is reported to have 0 mutations.

Can you please help me troubleshoot this issue?

Thanks,

Charles

ADD COMMENTlink
0
Entering edit mode

I think its because Hugo_Symbols are Unknown in your maf file.

ADD REPLYlink
0
Entering edit mode

I changed the transcript ID for the first mutation to match the corresponding gene, and that gene was still reported to not have any mutations. Also, I used "Unknown" (instead of NA, etc.) because that is what I thought the .maf format required for such genes.

Is there something else that should be changed besides "Unknown"?

ADD REPLYlink
0
Entering edit mode

I have used this programme a while back, and what I understand is, the gene names in maf file must match the gene names in your roi file, which you use for calc-covg function. Also, it will skip all those silent variants in Variant_Classification column ; unless you mention not skip so. In your example, I see that most of the varaints have Variant_Classification set to Unknown, which might be the one reason.

ADD REPLYlink
0
Entering edit mode

This is correct. The Hugo_Symbol needs to be properly defined. These calls seem to be annotated incorrectly as Targeted_Region, which is something that MuSiC skips as intergenic. Considering that the MAF says WGS, these might be legitimately intergenic calls. Check in a genome browser.

ADD REPLYlink
0
Entering edit mode

Yes - I want to characterize mutation rates in ncRNAs (most of which will not be covered in exome designs, and many of which are novel).

What would you recommend for the Variant_Classification and Variant_Type, in this situation?

ADD REPLYlink
1
Entering edit mode

You can refer to the documentation here. When you run music bmr calc-bmr, enable the option --noskip-non-coding. You'll still need to annotate each variant with a symbol that it can match back to a region in your ROI file. MAF format is not as detailed in distinguishing between ncRNA types. Variant_Classification will always say RNA. But name the genes differently using annotators like VEP, and you should be fine. Have you tried the maf2maf tool?

ADD REPLYlink
0
Entering edit mode

Thank you very much !!

ADD REPLYlink
0
Entering edit mode

This is also something i wonder how to prioritize such intergenic/intronic SNVs.

ADD REPLYlink
4
Entering edit mode
15 months ago
Duarte, CA

Thanks to Cyriac, I found the solution is as follows:

1) Set Variant_Classification to RNA

2) Use the "--noskip-non-coding" option when running music bmr calc-bmr

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3.1