HUGO and Ensembl ids
2.2 years ago
rsafavi • 40
@rsafavi45165

if we have "genes with the same HUGO ids but different Ensembl id" does it make sense to add up the raw count of those? ( for RNA expression or single cell analysis). Does it make sense to treat them as isoforms?

Maybe I can give an example. I have RNAseq samples from human and I have differential expression of gene NDUFA6 like below:

ENS ID                        Gene Name.          logFC
ENSG00000272765.   NDUFA6.               -0.6
ENSG00000281013.   NDUFA6                0.8
ENSG00000184983.   NDUFA6.               -0.6


As you can see, there are different ENS IDs (2 alternative sequence alignments and the last one is reference gene at the Ensembl website) for the same gene name. Usually, I do not get such different FCs for the alternative versions of the same gene but now it gets tricky. Should I integrate all alignments of the same gene name into one gene expression (for all such cases) and make the DE analysis again? Or how should I interpret this results?

2.2 years ago
EagleEye 6.4k
@EagleEye12958

Hi,

Only last one is from chromosome 22 assembly. The first 2 are from the exception contigs (haplotype variant contigs). So there is a reason to assign different 'ENSG' names for them. I would say always use 'ENSG' ids as reference/indexing purpose, when you are working with ensembl annotation. In my opinion, if you are working with gene-level analysis, you always summarize based on 'ENSG' ids. For transcript-level/isoform-level analysis, you always summarize based on 'ENST' ids

Note:

ENSG: Genes ; ENST: TranscriptVariants/Isoforms ; ENSE: Exons ; ENSP: Proteins