Entering edit mode
7.6 years ago
bonsalldavid
•
0
I have performed a blastx of 1,000,000,000 short sequences against a small database. This took a while,
blastall -p blastx -i infile.fa -d prot_blastdb -o outfile.txt -m 8 -S 3 -b 1 -e 0.001
I then filtered the original fasta file to pull out those with top hits. I can share my awk script for this if asked, but i'm wondering if there is a way to get blast to output the fasta sequences directly to a file as it finds them. It seems silly making two passes of the same file.
Use
blastdbcmd
utility (part of blast+) along with a list of accession numbers (one per line in a file with -entry_batch option with blastdbcmd) you are interested in to retrieve the sequences you need. While this is still a two step process it is reasonably fast....just had a thought. There might be faster options for filtering (eg hashing??) if blast output a numerical index for each read. I could add this artificially by prefixing the read name, but this would be a last resort I think.