NCBI Blast locally: filter by accession number and NOT by GI number
1
2
Entering edit mode
7.7 years ago
tlorin ▴ 360

I have downloaded the NCBI nt database using the blastdb_update.pl perl script, but I want to blast some query file not on the whole nt database but on specific species. I know that when using blast locally it is possible to subset the nt/nr database using a list of GI identifiers, as explained here.

However, NCBI is phasing out GIs and we should instead use accession.version identifiers. I have downloaded those for my species, below is part of the file mygi.txt.

When I run

blastdb_aliastool -gilist mygi.txt -db nt -out sthg.out -title sometitle

I obviously get

BLAST Database error: Specified file is not a valid GI/TI list. since I am not providing a GI list.

I cannot find any command-line option in the manual to specify that I want to filter the nt database by accession number; any idea of how I can achieve that? I bet this option will have to be added by the BLAST team at some point :)


mygi.txt below

AF324813.1
AF324814.1
AF324815.1
AF324816.1
AF324817.1
AF324818.1
AF324819.1
AF324820.1
AF324821.1
AF324822.1
AF324823.1
AF324824.1
AF370451.1
AY198341.1
AY198342.1
ncbi • 4.5k views
ADD COMMENT
0
Entering edit mode

An alternative (and dirtier ;) ) possibility could be using this, then using makeblastdb and blast on this newly created database.

ADD REPLY
3
Entering edit mode
7.7 years ago
GenoMax 141k

This solution adds a step but until NCBI updates the blastdb_aliastool to accept accession numbers this may the only way.

You can use blastdbcmd from blast+ package to retrieve sequences from nt db as fasta file followed by makeblastdb to make the blast indexes for the subset of sequences. my_acc.txt file is the file with accession numbers (one per line).

blastdbcmd -db /path_to/nt -entry_batch my_acc.txt -out my_seq.fa
ADD COMMENT

Login before adding your answer.

Traffic: 2196 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6