Hi all!
I am having some problems with Ensembl. I have some exons and their coordinates annotated with Ensembl 61 (mouse). As I couldn't find the permanent link to Ensembl 61 I used the release 66 to get the sequences and I used Emboss sixpack and Interporscan5 to see the corresponding protein domains, but I couldn't find any match. I also tried to submit the aa sequence given by biomart in Ensembl 66 and I didn't have any result. When I tried the same approach with Ensembl 67 I obtained different nucleotide sequences and the corresponding protein domains found with Interproscan 5 were correct (or at least made sense with the corresponding gene studied). I also tried to translate my coordinates to the latest release, Ensembl 79, and again both the nucleotide and the aa sequences when put in Interposcan5 gave results identical to those obtained with Ensembl 67. Why there are these differences between Ensembl 66 and Ensembl 67, given that they both refer to mm9? Can I be confident that I am retrieving the right sequence, since the annotation has been made with Ensembl 61? Where can I find Ensembl 61, given that what should be its permanent link (http://feb2012.archive.ensembl.org) leads to Ensembl 66?
Thanks
Hi,
Just FYI - it's EnsEMBL, not Ensemble - the last 'e' is not part of the name :)
I corrected the instances in your post.
Thanks! and sorry for the mistake :)
That's OK, we are known for our quirky anagrams!
For what it's worth, you can find Ensembl release 61 via ftp here (just click on anything you want under release 66 and then change 66 to 61 in the url). That along with 66 and 67 both refer to mm9, so the only difference should be the presence/absence of some patches and genes on them. An example of the sequence you got, what you got with release 67, and how you achieved all of that would be needed to give you specific further help.
Hi!
Thank you very much for your reply and for have posted the release 61. I have to apologize, I was using Biomart with my regions of interest and I didn't realize that when I select exons sequences it gives the sequences of all the exons of the genes which contain these regions and not only the sequences of the regions. When using Region Report the sequences are the same in Ensembl 66 and 67. I still have one problem, I can't find any match for my regions, even if they are from known genes. Some of these regions are really small, less than 20 aa, but I can't find matches even for bigger regions of more than 400 aa.
For example I have the following region on the sense strand:
The sequence with Region Report Ensembl 66 and 67 is
I tried different ORF of the Emboss 6-pack output, both reverse and not reverse, but I couldn't find any match in Interproscan 5. Am I doing something wrong?
I guess that one approach would be to find the exons in which these regions are contained (for example looking at the exons sequences in Biomart and then identify my specific region sequence found with Region report). I can do it manually as I don't have many regions now, but is there a faster approach?
Thanks again and sorry for the previous post, I am new in this field.
What sort of translations are you getting? The proper one that I get with a simple online tools is:
Note that this will only work with unspliced genes.
This is the output with Emboss 6-pack
I tried also your aa sequence with Interproscan 5 and I didn't find any match. This sequence is an exon or part of an exon which is differentially spliced and is a coding sequence. Why can't I find any protein domain encoded by this sequence?
The first sequence (i.e., what I posted as well) is the correct one. Interproscan scans for known signatures, so it won't always find anything. Why not read some reviews on this family of proteins, perhaps this is an intrinsically disordered region.