Question

Why are the sum of all exons so much longer than CDS?

1

Entering edit mode

5.1 years ago

Jeremy Leipzig 22k

Is this because of non-coding RNA? I thought these would have been at least comparable.

library(GenomicFeatures)
txdb <- makeTxDbFromEnsembl("Homo Sapiens",server="useastdb.ensembl.org")
gr<-cds(txdb)
sum(width(reduce(gr)))
[1] 41901692

gr<-exons(txdb)
sum(width(reduce(gr)))
[1] 153094341

exons cds • 1.8k views

ADD COMMENT • link updated 5.1 years ago by swbarnes2 14k • written 5.1 years ago by Jeremy Leipzig 22k

1

Entering edit mode

yes, there are a bunch of non-coding RNA families (rRNA, tRNA, miRNA, snRNA, lncRNA, ...) which can be in your exon list but not in your CDS

ADD REPLY • link 5.1 years ago by JC 13k

score 5 · Answer 1 · 2019-03-05

5

Entering edit mode

5.1 years ago

Eric Lim ★ 2.1k

3' and 5' UTR as well as non-coding species.

ADD COMMENT • link 5.1 years ago by Eric Lim ★ 2.1k

2

Entering edit mode

how about that!

sum(width(unlist(fiveUTRsByTranscript(txdb))))
[1] 21299710
sum(width(unlist(threeUTRsByTranscript(txdb))))
[1] 86927764

ADD REPLY • link 5.1 years ago by Jeremy Leipzig 22k

1

Entering edit mode

Yeah, the mean 3' UTR is around 40% of the length of a transcript and a not insubstantial number of UTRs are more than 75% of the transcript.

ADD REPLY • link 5.1 years ago by i.sudbery 19k

0

Entering edit mode

4-5x is right in the neighborhood of their average between 5' and 3'. :)

Table 1 in https://www.ncbi.nlm.nih.gov/pmc/articles/PMC139023/

ADD REPLY • link 5.1 years ago by Eric Lim ★ 2.1k

score 0 · Answer 2 · 2019-03-05

0

Entering edit mode

5.1 years ago

swbarnes2 14k

Also, ensembl will count an exon as unique if it overlaps with other exons, so a lot of sequence is being counted over and over again because it belongs to multiple slightly different exons.