Suggestions To Build Up A Pipeline Meant To Compare Genomic Ranges Among Species
1
2
Entering edit mode
11.8 years ago
Anima Mundi ★ 2.9k

Hello,

I have a list of mouse genes, and I would like to find their orthologs in other species, with a special focus on mammals. I would also like to have a look to their location (let's say 5 Mb left and right in every species in wich the ortholog was found). In these regions, I would like to annotate as much as possible the genes found. So the major problems are two: how to find the orthologs? How to annotate all the genes found in the +-5Mb regions in every single species?

For the first problem, I thought I could use Homologene, but there is the risk to discard a priori useful information (as several sequenced genomes are not implemented). For the second one, I would BLAT the obtained orthologs in their genome, somehow extract every gene found in the +-5Mb regions as FASTA, then BLAST them against the mouse genome.

Hope this is clear enough.

comparison annotation • 2.6k views
ADD COMMENT
0
Entering edit mode

I'd be looking at Ensembl and/or the UCSC database tables to do this, unless there's a reason not to (e.g. lack of species) ?

ADD REPLY
0
Entering edit mode

If I understood well, you mean the tables displayed in the genome browsers; I made a pre-screening with the Ensembl one (see also: http://www.biostars.org/post/show/49097/visualize-gene-descriptions-on-the-maps-of-ensembl-genome-browser/ ). The problems are the lack of species, as you said, and the lack of annotation directly evaluable from the map. I was able to have a general idea, but now I should deepen the analysis.

ADD REPLY
1
Entering edit mode
11.8 years ago

Two tools you might consider in the pipeline, in addition to the UCSC tables and Ensembl already mentioned:

The VISTA browser allows comparison across genomes of multiple organisms, and you can download the alignments here. You can also import VISTA tracks into UCSC to accomplish much of your annotation.

Another tool that may be useful is LAMHDI. This lets you search multiple model organism data sets simultaneously (mouse, rat, others) for a genomic region of interest. It looks like right now their data are available only through the web browser, so to script something more automated (ie multiple gene lookups) you'd probably have to contact the developers directly.

ADD COMMENT
0
Entering edit mode

Thank you, Alex. So far, it appears to me that to use VISTA browser I would need to define first a list of orthologs for every gene (i.e. via Homologene); then I would obtain their MSAs, but I would like to make a compare between genomic contexts, not just the orthologs themselves: maybe this would be achieved adding a large range upstream and downstream to the FASTAs obtained from Homologene, but would the resulting MSAs be good enough? How could I interpretate such results? I also downloaded the alignments, blasting the mouse entries against the human genome to define the location of interest (as they are listed by human chromosome number), and I have now a broad alignment but still I think there should be a better way to start.

I had a look to the VISTA-tracks, they point to a mirror of the UCSC, I visualized the maps but I was not able, until now, to find out how to download the tracks to load them in the Table browser.

LAMHDI lists just three mammal species, mouse included, and it also appears to be focused on disease.

ADD REPLY

Login before adding your answer.

Traffic: 2369 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6