Local blast limit query search by GI list?
1
0
Entering edit mode
8.2 years ago
milt0n • 0

Hi All,

I am trying to blast a file that contains about 42k fasta sequences against a local blast database (nt), and I would like to restrict the search space. I read that a common way to do that is to restrict the search using "gi" (see command line below).

My question is: How would you go about to obtain a list of gi striclty for bacteriophage related nucleotide sequences? What I have done before is going to the NCBI nucleotide database, searching for "bacteriophage", then exporting the list of results to a gi file. But I am not sure if this is the way to do it as I get also other results (other microbes).

Or am I going about this wrong?

Thanks for you help,

C

$ blastn -db nt -gilist list.gi -query seq.fasta -out blast_results.txt
bacteriophage blast • 3.4k views
ADD COMMENT
3
Entering edit mode
8.2 years ago
GenoMax 141k

That is the right way to do this. Getting all viral genomes and parsing out bacteriophage gi's may be preferred option. I see 1700+ entries for phages.

$ grep "phage" viral.1.1.genomic.fna | awk -F "|" '{print $2}' > phage_gi_list

should do it.

You could try the taxonomy ID route to get a more restricted set of gi: http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=38018 I am not sure if that option gives you all bacteriophages though.

ADD COMMENT
0
Entering edit mode

Thanks, the first link is indeed too restrictive, but I see what you mean. I'll explore a bit further.

C

ADD REPLY
1
Entering edit mode

Go with the viral genomes option. I will move it up in the post above.

ADD REPLY

Login before adding your answer.

Traffic: 2675 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6