I'm looking at p53 in IGV using hg38. My view window is chr17:7,648,987-7,708,368:
I understand that that the thin lines are introns and the darker rectangles are exons. The short parts of the rectangles are non-coding regions and the tall parts are coding. The coding begins with an ATG start sequence (read from right to left):
What I can't figure out is what defines the intron/exon boundaries. According to Genomes 2, the boundaries look like:
- 5' splice site 5'-AG↓GTAAGT-3'
- 3' splice site 5'-PyPyPyPyPyPyNCAG↓-3'
(I changed U -> T, since this is DNA; N = any base, Py = T or C)
Reading the left end of the intron on the right above (from right to left), I see CCTCTTGCAG, i.e. PyPyPyPyPyPyNCAG! So far so good.
But looking at the right end of the left intron, I don't see anything resembling AGGTAAGT or TGAATGGA.
The PyPyPyPyPyPyNCAG pattern seems to be loosely followed on the left edges: The GA is very consistent. The next base is often a C but sometimes a T. And the Pyrimidines (C/T) do seem to be more common in the next six slots, although As and Gs occur in these positions in 6/10 introns.
There seems to be no structure to the right edges beyond the consistent TG.
So what defines the intron boundaries? Am I reading these incorrectly?
also I forgot to mention that your questions was well researched and clearly you made a lot of effort to make sense of the data, that is pretty cool and we like that a lot here!