Output blastx top hits as fasta of the original sequences
0
0
Entering edit mode
7.6 years ago

I have performed a blastx of 1,000,000,000 short sequences against a small database. This took a while,

blastall -p blastx -i infile.fa -d prot_blastdb -o outfile.txt -m 8 -S 3 -b 1 -e 0.001

I then filtered the original fasta file to pull out those with top hits. I can share my awk script for this if asked, but i'm wondering if there is a way to get blast to output the fasta sequences directly to a file as it finds them. It seems silly making two passes of the same file.

blast blastx • 1.8k views
ADD COMMENT
1
Entering edit mode

Use blastdbcmd utility (part of blast+) along with a list of accession numbers (one per line in a file with -entry_batch option with blastdbcmd) you are interested in to retrieve the sequences you need. While this is still a two step process it is reasonably fast.

ADD REPLY
0
Entering edit mode

...just had a thought. There might be faster options for filtering (eg hashing??) if blast output a numerical index for each read. I could add this artificially by prefixing the read name, but this would be a last resort I think.

ADD REPLY

Login before adding your answer.

Traffic: 3211 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6