Download genes sequences from large number of whole genomes in NCBI
1
0
Entering edit mode
6.2 years ago
marcoooo • 0

Hi,

I have a large list of NCBI accession codes of complete genome sequences of a species, and need to download a few gene from each one of these sequences (same genes for each sequences, whit slightly different position in different sequences). Being the list large, I cannot manually check for the genes positions in the annotations of each sequence and download the regions of interest, and I was wondering if there is a way of using the NCBI tools to do this in a more automatized fashion. I tried playing with efetch and eUtils, but with no success so far.

Does anybody have any idea how to do this?

I know that the "download all the sequences and align them to find the genes" should work, but few of these sequences have Ns stretches that makes the alignment problematic.

Thanks in advance for the help.

Best

gene genome sequence NCBI • 1.0k views
ADD COMMENT
0
Entering edit mode
6.2 years ago

Download the blast database of refseq, then use the blastdbcmd BLAST database client to query and extract sequences. You can extract by name, by coordinate, strands etc.

This is probably the fastest and most efficient way to query the entire refseq.

ADD COMMENT

Login before adding your answer.

Traffic: 2269 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6