Hi,
I am making a heatmap to check the expression of some marker genes of different cell types in zebrafish.
I have used biomaRt to convert the ensembl IDs to ZFIN IDs to more easily interpret the heatmap, however, in the object that biomaRt returned, there are many duplicates in the ensembl IDs i.e. some ensembl IDs are mapping to more than one ZFIN ID.
I was wondering how exactly to interpret this (is it maybe due to some of the ensembl genes potentially having multiple transcripts with the same ensemble gene id but different ZFIN Ids?) and how I should go about choosing between ZFIN IDs in the case of multiple?
I have started from an excel spreadsheet containing the counts so maybe it has something to do with the upstream pipeline (aligning, counting etc)? I am planning to eventually rerun from raw data onwards so if it's something I can fix by upstream that would also be helpful.
This doesn't actually affect my current heatmap as none of the genes I am looking at are involved, but I am just wondering what best practice is/what's causing this for future scenarios.
Thanks in advance,
Liam
I've just realised you were asking about the opposite problem to the one I've answered here. Can you send us your biomaRt query please? BioMart is centred on the Ensembl gene objects, so should not be giving more than one zFIN ID.
Hi Emily,
Thanks for the reply, sorry, I meant to add my R script into my original question but then completely forgot. The code I used is as follows:
I saw similar code posted on a forum by someone else, so I'm not sure its exactly how I'm meant to do it?
Liam