Gene name and ensembl ids
1
0
Entering edit mode
5.2 years ago

Why a does a single gene name say MAPK3 have multiple ensembl ids and multiple fasta sequence? Isn't there supposed to be a single fasta sequence for each gene name?

gene sequence • 1.2k views
ADD COMMENT
3
Entering edit mode

Hi, please google isoforms and alternative splicing.

ADD REPLY
2
Entering edit mode

Not necessarily. There can be more than one transcript variants.

ADD REPLY
0
Entering edit mode

In fact, checking the latest GENCODE release for human, there are 58381 annotated genes. Of these, 36076 genes have more than one annotated transcript. Summary statistics (transcripts per gene):

Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   1.000   1.000   3.491   4.000 192.000

and quantiles:

10% 20% 30% 40% 50% 60% 70% 80% 90% 95% 99% 
  1   1   1   1   1   1   3   5  10  14  24

Note that this of couse contains many single-exon genes like smallRNA species and the picture probably changes for classical protein-coding genes.

ADD REPLY
1
Entering edit mode

I think this would be even higher if you limited it the ~20,000 protein coding genes. Very few protein coding genes have only one transcript.

ADD REPLY
1
Entering edit mode

...in most eukaryotes...

ADD REPLY
0
Entering edit mode

Quite right... Sorry, mammal focused again!

This is possibly not even true for most eukaryotes. Just most mammals. I don't think (for example) most Arabidopsis genes have multiple transcripts annotated. Last time a checked it might not even have been true for flys (although that was a while ago).

ADD REPLY
0
Entering edit mode

True: Only protein-coding:

 Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   2.000   5.000   7.377  10.000 192.000 

10% 20% 30% 40% 50% 60% 70% 80% 90% 95% 99% 
  1   2   3   4   5   7   9  11  16  20  31
ADD REPLY
3
Entering edit mode
5.2 years ago

There are two sorts of ENSEMBL ID. The first is the gene id. The gene MAPK3 maps to a single ENSEMBL gene id in human - ENSG00000102882. The other sort of ID is the ENSEMBL transcript id. As MAPK3 has several transcripts, there are several ENSEMBL transcript ids.

Note that there are cases where a signle gene symbol has more than one ENSEMBL gene id. This is because HUGO (which decides gene symbols) and ENSEMBL (which assigned ENSEMBL ids) don't necessarily agree on what is what gene. So for example, the gene IGF2 has two ids: ENSG00000129965 and ENSG00000167244. This is because there is a read-through transcript that incorporates parts of both the classic IGF2 ORF and the adjacent INS ORF. Ensembl has decided this represents two different genes (IGF2 and INS-IGF2) where as HUGO only allocates a single SYMBOL (IGF2)

ADD COMMENT

Login before adding your answer.

Traffic: 1719 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6