Output blastx top hits as fasta of the original sequences

0

Entering edit mode

7.6 years ago

bonsalldavid • 0

I have performed a blastx of 1,000,000,000 short sequences against a small database. This took a while,

blastall -p blastx -i infile.fa -d prot_blastdb -o outfile.txt -m 8 -S 3 -b 1 -e 0.001

I then filtered the original fasta file to pull out those with top hits. I can share my awk script for this if asked, but i'm wondering if there is a way to get blast to output the fasta sequences directly to a file as it finds them. It seems silly making two passes of the same file.

blast blastx • 1.8k views

ADD COMMENT • link 7.6 years ago by bonsalldavid • 0

1

Entering edit mode

Use blastdbcmd utility (part of blast+) along with a list of accession numbers (one per line in a file with -entry_batch option with blastdbcmd) you are interested in to retrieve the sequences you need. While this is still a two step process it is reasonably fast.

ADD REPLY • link 7.6 years ago by GenoMax 141k

0

Entering edit mode

...just had a thought. There might be faster options for filtering (eg hashing??) if blast output a numerical index for each read. I could add this artificially by prefixing the read name, but this would be a last resort I think.

ADD REPLY • link 7.6 years ago by bonsalldavid • 0

Login before adding your answer.