Hi everyone,
I am pretty sure most of you can come with a quick solution to that, so sorry for the simple question. Here is the issue, I have multiple files for SNV counts, each corresponding to a sample. I want to merge all into one data frame, but still would like to keep a unique ID for each file, other than the file names themselves (those are too large).
The example:
> file_list <- list.files()
> file_list
[1] "TCGA-A1-A0TT-01.hg19.oncotator.hugo_entrez_remapped.maf.txt"
[2] "TCGA-A1-A0TG-01.hg19.oncotator.hugo_entrez_remapped.maf.txt"
[3] "TCGA-A1-A0SH-01.hg19.oncotator.hugo_entrez_remapped.maf.txt"
[4] "TCGA-A1-A0BF-01.hg19.oncotator.hugo_entrez_remapped.maf.txt"
[5] "TCGA-A1-A0XB-01.hg19.oncotator.hugo_entrez_remapped.maf.txt"
[6] "TCGA-A1-A0YI-01.hg19.oncotator.hugo_entrez_remapped.maf.txt"
I am using this line to merge them all:
> datamerge <- do.call("rbind", lapply(file_list, FUN=function(files) {
read.table(files, header=T, sep="\t", fill=T)
}))
I get this:
> datamerge
Chrom start end ref alt Symbol Patient_id variant_class
1 10 116247760 116247760 T C ABLIM1 TCGA-A1-A0TT-01A-11D-A142-09 Missense_Mutation
2 12 43944926 43944926 T C ADAMTS20 TCGA-A1-A0TT-01A-11D-A142-09 Missense_Mutation
3 3 85932472 85932472 C T CADM2 TCGA-A1-A0TG-01A-11D-A142-09 Silent
4 2 25678299 25678299 C T DTNB TCGA-A1-A0TG-01A-11D-A142-09 Missense_Mutation
5 17 40272381 40272381 G A KAT2A TCGA-A1-A0TG-01A-11D-A142-09 Silent
6 5 80024722 80024722 T - MSH3 TCGA-A1-A0TG-01A-11D-A142-09 Frame_Shift_Del
7 6 135507043 135507044 - A MYB TCGA-A1-A0SH-01A-11D-A142-09 Frame_Shift_Ins
8 16 74425902 74425902 T C NPIPB15 TCGA-A1-A0BF-01A-11D-A142-09 Missense_Mutation
9 22 16449539 16449539 A G OR11H1 TCGA-A1-A0BF-01A-11D-A142-09 Missense_Mutation
10 20 16730581 16730581 G A OTOR TCGA-A1-A0BF-01A-11D-A142-09 Missense_Mutation
11 X 78216689 78216689 C T P2RY10 TCGA-A1-A0XB-01A-11D-A142-09 Silent
12 16 88790292 88790292 T C PIEZO1 TCGA-A1-A0XB-01A-11D-A142-09 Missense_Mutation
13 1 44476442 44476442 C T SLC6A9 TCGA-A1-A0XB-01A-11D-A142-09 Missense_Mutation
14 17 7491739 7491739 T G SOX15 TCGA-A1-A0YI-01A-11D-A142-09 Missense_Mutation
Note that each file has also its own "Patient_id" (with the same prefix plus other strings) on the merged table.
Now, as I said at the beginning I would like to keep my "Patient_id" with unique values to each file merged, however changing its name. Say, from the example here, all my samples are lung cancer (LUAD), so I would like something like this:
Chrom start end ref alt Symbol Patient_id variant_class
1 10 116247760 116247760 T C ABLIM1 LUNG_1 Missense_Mutation
2 12 43944926 43944926 T C ADAMTS20 LUNG_1 Missense_Mutation
3 3 85932472 85932472 C T CADM2 LUNG_2 Silent
4 2 25678299 25678299 C T DTNB LUNG_2 Missense_Mutation
5 17 40272381 40272381 G A KAT2A LUNG_2 Silent
6 5 80024722 80024722 T - MSH3 LUNG_2 Frame_Shift_Del
7 6 135507043 135507044 - A MYB LUNG_3 Frame_Shift_Ins
8 16 74425902 74425902 T C NPIPB15 LUNG_4 Missense_Mutation
9 22 16449539 16449539 A G OR11H1 LUNG_4 Missense_Mutation
10 20 16730581 16730581 G A OTOR LUNG_4 Missense_Mutation
11 X 78216689 78216689 C T P2RY10 LUNG_5 Silent
12 16 88790292 88790292 T C PIEZO1 LUNG_5 Missense_Mutation
13 1 44476442 44476442 C T SLC6A9 LUNG_5 Missense_Mutation
14 17 7491739 7491739 T G SOX15 LUNG_6 Missense_Mutation
Would you have any workaround for this? Either adding the proper name while merging the files or after it. It doesn't really matter to me.
Any help is very much appreciated! (:
Thanks a lot, benformatics! It was extremely helpful! Amazing!