Hi,
I'm trying to retrieve "UTR" sequence (i.e. 5'UTR sequence) from "Exon" of "Canonical Transcript" using a list of gene symbols.
First I tried Ensembl martview to do the job (http://www.ensembl.org/biomart/martview/).
But output provides more 5UTR sequences (i.e 150 results) then than input gene numbers (i.e. 80)
I assume that one gene could correspond to multiple transcripts and this is why it gives more results than It was given. Note that my input was gene ID, not transcript ID.
Therefore, I thought retrieving the UTR exon sequence only from "Canonical transcript" using a list of gene symbol could avoid this problems.
Then I searched a bit, and tried the table browser from UCSC genome browser, selected KnownCanonical table, which I can select only one canonical transcripts, but it only provides coordinates of UTR, not the sequence itself.
Please advice me or let me know any reference that I can look up, that would be very helpful.
I can do some basic R programming, but never used BiomaRt. but I'm willing to try if BiomaRt 'getSequence' is the way to go.
Thank you very much!!!
Thank you Devon, I do have BED file with canonical transcripts 5UTR coordinates. I will try