Hi everyone,
I am processing oncotated MAFs downloaded from Firehose, and I am particularly interested in extracting dbNSFP information for each missense mutation.
However, I am pretty confused by the format. Here is a common example. For a missense mutation in ABLIM1, the i_dbNSFP_Ensembl_transcriptid column lists the following transcripts that overlap with the mutation:
ENST00000336585;ENST00000369252;ENST00000392952;ENST00000369257;ENST00000369267;ENST00000533213; ENST00000369262;ENST00000369263;ENST00000369266;ENST00000369256;ENST00000369260; ENST00000277895;ENST00000369253;ENST00000428430|ENST00000392955
When I now look at the columns that should be showing the functional impact for each transcript, I find the following, .e.g in the i_dbNSFP_Polyphen2_HVAR_score column:
0.861;0.603;0.279;0.893;0.887;0.999;0.992;0.873;0.595|.
There are 15 transcripts, but only 10 predictions. I see no way of parsing which prediction score belongs to which transcript, and of course a bunch are missing.
This is just one example of too many to count. Am I misunderstanding how this information is structured?
I would appreciate any help!!
Thanks so much. Kamila