Exon-Start Same As Exon-End From Ucsc Knowngene
2
3
Entering edit mode
13.0 years ago
brentp 24k

As an example, a single row from UCSC knownGene (hg19) like this:

SELECT cdsStart,cdsEnd,K.name,exonStarts,exonEnds FROM knownGene as K, kgXref as X WHERE
X.kgId=K.name and K.name='uc002imy.2'

The output (with new-lines added so that exonStarts and exonEnds line up):

cdsStart        cdsEnd  name    exonStarts      exonEnds
46103793        46115139        uc002imy.2      
46103534,46105837,46106490,46109521,46110051,46110576,46111228,46114216,46115032,46115092,46115124,   
46103841,46105876,46106542,46109599,46110107,46110668,46111310,46114291,46115092,46115122,46115152,

Note that the 2nd-from-last exonStart is the same as the 3rd-from-last exonEnd (46115092). What does this mean. A single row in knownGene is a single transcript, so what does it mean to have a zero-length intron? There are enough of these that I want to understand what is going on.

I have asked this question on the UCSC mailing list but no answer yet.

ucsc exon splicing bed transcript • 2.8k views
ADD COMMENT
0
Entering edit mode

A response on the mailing list explains that it's due to gaps on the query relative to the transcript sequence. I hadn't thought about these issues before now.

ADD REPLY
1
Entering edit mode
13.0 years ago
Scott Cain ▴ 770

I wonder if there is a CDS boundary there, like a stop codon. Sometimes I've seen data goofs where one exon is split into two when part of it is coding and the other isn't.

ADD COMMENT
0
Entering edit mode

Could be... though this occurs even in transcripts that are (annotated as) non-coding.

ADD REPLY
1
Entering edit mode
13.0 years ago
Pi ▴ 520

Could it be intron retention?

ADD COMMENT

Login before adding your answer.

Traffic: 1362 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6