Get Utr5' And Utr3' Data For Genes From Ucsc Genome Browser
1
1
Entering edit mode
11.8 years ago

I'm attempting to create an svg image of various transcripts of genes and I'm using data on human genes from the UCSC Genome browser and I'm running into trouble. I have direct MySql access to the database. I have data on exon start and end base pair positions and transcription start and end base pair positions. I'm looking for data either on the base pair position of UTR5' and UTR3' regions of genes or simply the coding region base pair positions which I can then use to exclude that part from the range of the whole gene and leave the remaining area as UTR5' and UTR3'. Any ideas on where in the hg19 database in UCSC Genome Browser I could find this data?

ucsc orf utr • 5.7k views
ADD COMMENT
4
Entering edit mode
11.8 years ago
Vikas Bansal ★ 2.4k

I think you can find your solution here. Pierre's answer -

mysql -h genome-mysql.cse.ucsc.edu -u genome -D hg19 -N -A -e 'select distinct chrom,strand, txStart,cdsStart from knownGene where txStart< cdsStart union select distinct chrom,strand,cdsEnd,txEnd from knownGene where cdsEnd< txEnd ' > utrs.txt

EDIT: After OP's comment.

image

ADD COMMENT
0
Entering edit mode

added 'distinct'

ADD REPLY
0
Entering edit mode

Looking through this knownGene table and many times the cdsStart and cdsEnd are the same value. Which doesn't make any sense. How can I trust this data?

ADD REPLY
0
Entering edit mode

so to clarify the cds start and end positions contain no parts of the UTR5' or UTR3' regions correct?

ADD REPLY
0
Entering edit mode

CDS is coding sequence which gets translated to protein. UTR's are untranslated regions. Please have a look at this picture.

ADD REPLY
0
Entering edit mode

So if the data I have says that CDS end comes before transcription is ended then something is wrong correct?

ADD REPLY

Login before adding your answer.

Traffic: 1874 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6