Biostar Beta. Not for public use.
Why are the sum of all exons so much longer than CDS?
1
Entering edit mode
13 months ago
Philadelphia, PA

Is this because of non-coding RNA? I thought these would have been at least comparable.

library(GenomicFeatures)
txdb <- makeTxDbFromEnsembl("Homo Sapiens",server="useastdb.ensembl.org")
gr<-cds(txdb)
sum(width(reduce(gr)))
[1] 41901692

gr<-exons(txdb)
sum(width(reduce(gr)))
[1] 153094341
exons cds • 306 views
ADD COMMENTlink
1
Entering edit mode

yes, there are a bunch of non-coding RNA families (rRNA, tRNA, miRNA, snRNA, lncRNA, ...) which can be in your exon list but not in your CDS

ADD REPLYlink
5
Entering edit mode
12 months ago
Eric Lim ♦ 1.4k
Boston

3' and 5' UTR as well as non-coding species.

ADD COMMENTlink
2
Entering edit mode

how about that!

sum(width(unlist(fiveUTRsByTranscript(txdb))))
[1] 21299710
sum(width(unlist(threeUTRsByTranscript(txdb))))
[1] 86927764
ADD REPLYlink
1
Entering edit mode

Yeah, the mean 3' UTR is around 40% of the length of a transcript and a not insubstantial number of UTRs are more than 75% of the transcript.

ADD REPLYlink
0
Entering edit mode

4-5x is right in the neighborhood of their average between 5' and 3'. :)

Table 1 in https://www.ncbi.nlm.nih.gov/pmc/articles/PMC139023/

ADD REPLYlink
0
Entering edit mode
15 months ago
swbarnes2 5.7k
United States

Also, ensembl will count an exon as unique if it overlaps with other exons, so a lot of sequence is being counted over and over again because it belongs to multiple slightly different exons.

ADD COMMENTlink
0
Entering edit mode

That’s why I used reduce

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1