I'm a beginner in bioinformatics.
in TCGA data portal, there are 5 types of TCGA.BRCA.MAF (https://portal.gdc.cancer.gov/repository/ *not legacy archive)
I am interested in these files:
(all of maf files have same samples)
TCGA.BRCA.varscan.6c93f518-1956-4435-9806-37185266d248.DR-10.0.somatic.maf.gz
TCGA.BRCA.mutect.053f01ed-3154-4aea-9e7f-932c435034b3.DR-10.0.protected.maf.gz
TCGA.BRCA.muse.b8ca5856-9819-459c-87c5-94e91aca4032.DR-10.0.somatic.maf.gz
TCGA.BRCA.mutect.995c0111-d90b-4140-bee7-3845436c3b42.DR-10.0.somatic.maf.gz
TCGA.BRCA.somaticsniper.7dd592e3-5950-4438-96d5-3c718aca3f13.DR-10.0.somatic.maf.gz
I want to merge/combine 5 maf files to single maf file rather than choose only one
I have two cases in mind.
1- remain only mutations reported by at least 2(or 3) maf files (callers) 2- remain all mutation but, de-duplicate (solving the problem by the same sample)
Is there any tool I can use in this situation? or any document?
thank you..
Hi ,
I don't know about .maf files format but if you have vcf files available you can try some think describe like here MAF option in vcftools .
Best
No, this relates to the Mutation Annotation Format files that are listed in the TCGA - it does not relate to minor allele frequency.
One issue with using this open-access mutation data from the TCGA is that different tools were used to call these variants that are listed in each MAF file, which already biases the results. You can simply row-bind each file together and then create a unique key to identify each mutation, but it's not easy, I admit.
If at all possible, get the protected VCF files and use just a single somatic variant caller on all of them. Otherwise, if you're not confident in managing the MAF files, then use something like cBiioPortal to obtain lists of mutations - there is also a R Programming Language interface for this.