Transcript Length
2
3
Entering edit mode
12.9 years ago
alittleboy ▴ 220

Hi All:

I have a question about transcript length: can I know the reasonable "range" in base pair for transcript length? When I use the getlength() function in the goseq bioconductor package, which uses UCSC genome browser for each combination of genome and id, I found the range to be from ~300 bp to ~80,000 bp. Is that long transcript reasonable?

Sorry I don't know much about related biology, but from a statistical perspective, I may consider that it is an outlier...Is this true? Thank you very much!

transcript length • 7.1k views
ADD COMMENT
5
Entering edit mode
12.9 years ago
Spitshine ▴ 660

Depends what you call outlier. The human genome codes for Titin, a protein with > 30,000 aminoacids, hence the mRNA should be >90,000 bp. So, 80,000 bp is in fact a little short but the number of protein-coding transcripts that long is small.

ADD COMMENT
0
Entering edit mode
12.9 years ago

You should not be overly concerned with outliers when using RefSeq, Ensembl or Havana standards to define gene and transcript coordinates and hence their lengths. Titin is a great example (+1) and isoform NM_133378 is 101520 bp long. Note the RefSeq accession. Keep in mind that non-protein coding genes may have a very different distribution of length - microRNAs are quite short and lincRNAs can be long. Transcribed pseudogenes would generally be shorter than the functional version of that gene.

ADD COMMENT

Login before adding your answer.

Traffic: 1735 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6