Entering edit mode
6.6 years ago
tim.ivanov.92
▴
40
I have 2 coordinates on the reference genome, and i want to find corresponding nucleotide sequences of this region on a list a close related organisms. What are the best approaches/tools for this job?
but liftOver has this warning
Yes it does. I think UCSC has done the due diligence while creating the liftover/net files. You can use liftOver and then double check the data by alignment yourself, if you are worried about the warning above.
well, ok thank you :)
is it hard on requirements, though? What is the minimal set of inputs?
Since you have only 2 coordinates use web form.
From
andTo
selection of genomes (only ones available will be shown). And your coordinate data in BED format. Minimal requirement is in the link on the liftOver page (https://genome.ucsc.edu/FAQ/FAQformat.html#format1 only 3 fields are required).well, i sort of simplified the bigger task, to get the idea
i will be doing it on multiple organisms (all related, its drosophila family), and on many coordinates
Then you would need to use the command line version of the program that you can download (linux only, scroll down on the page). Luckily Drosophila genomes are represented and as long as ones you need are there you should be set. Note: You can only use the tool for genomes that have a chain/net file combination available (which is what provides the mapping). You can make those files yourself if you need to use it on genomes not present there. It would not be trivial.
thank you again, for all your answers
chain files is exactly whats been troubling me. Are you saying, there are cross-pair maps between drosophila species, made by ucsc project? Where can i download them? Or look up
See this page. Find the Drosophila genomes. Liftover (chain) files have their own link under each genome. There are multiple genome build for each.