Hello, First of please forgive me I have little knowledge in the subject, I'm coming from a computer science only background.
I'm building a webservice around ensembl rest api, so far all is good until I needed to get an exon only sequence.
What I'm trying to do is giving coordinates for example: human:10:101654703-101659823:-1 is 1- getting the sequence from the ensembl rest api. [easy enough] 2- getting overlapping exons that are protein coding in that region. example from the api 3- using the start and end of exons to get the whole overlapping sequence.
Now here are the problem I'm facing: 1- I believe there are different sources for exons(ensembl, ensembl_havana, havana). Which should I use and how? atm I'm prioritizing ensembl_havana and using that only, but I believe that is incorrect since ensembl_havana means exons that are agreed upon by both teams so I should add the rest of the exons reported by one of the teams to that ?
2- What's an Exon rank? didn't find information about that.
3- Given a negative strand and positive strand exons what to do and vice versa?
4- What's the Exon version ?
I apologise again for the amount of questions, but I've been struggling for a week with this, I'm getting valid results and more invalid ones.
Thank you.
Thank you for the explanation, just to make sure I understand the rank of an exon, it doesn't matter if rank 1 doesn't have the lowest start index compared to the rest, when calculating the exon sequence it should start with the sequence of exon with rank 1?
If the gene runs backwards, exon 1 should have the highest genomic position. But regardless of the direction of the gene, exon #1 is first of its transcript (it might not be first in another transcript of the same gene). If you are pulling it out by name or exon ID, it should be in the "right" orientation no matter what.