Species identification via gilist
0
0
Entering edit mode
7.1 years ago
Penny Liu ▴ 30

I want to narrow down the blastn search against nt database using gilist.

I already got all taxids of bacteria (taxid 2) and extacted GIs with csvtk (Please refer to this).

The next step was to proceed bacterial species identification.

When I run

blastn -query query.fasta -db /path/to/nt -gilist bacteria.taxid.gi.txt -evalue 1e-6 -outfmt 6 -out sequences.txt

An error occured:

BLAST Database error: Specified file is not a valid GI/TI list.

Please refer to the attached file.

bacteria.taxid.gi.txt (Number of taxids: 309,264,110)

What am I doing wrong? Thanks for the help in advance.

gilist blast taxid • 2.9k views
ADD COMMENT
1
Entering edit mode

Hello! I see a couple of possible problems:

1) your gi.list file is too large, 3 Gb. BLAST has some limits as far as I remember.

2) BLAST cannot find the file since you put it here: http://bioinfo.cs.ccu.edu.tw/CCU_bioinf/bacteria.taxid.gi.txt If you run blast ih the same directory, it's OK

3) Your list of gis have a header gi, that is not a gi-number, right?

ADD REPLY
0
Entering edit mode

You're right. The word gi is redundancy. I removed the redundant data from text file, then the problem is solved. :)

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Hi Yi-Ting, can I ask how did you get this bacteria gi list from? I am trying to download it directly from the NCBI (by 'save to file' -> GI List etc...) but it failes due to timeout error.. Do you have an easy way to do that? tnx in advance

ADD REPLY
0
Entering edit mode

My extract method same as you. This process can take several hours to complete. I added multiple keywords (term=whole+genome+bacteria) to narrow down the search scope.

ADD REPLY

Login before adding your answer.

Traffic: 1479 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6