Multiple protein accession number query Command LIne Blastp
1
1
Entering edit mode
8.9 years ago
john ▴ 50

So I've scoured the internet and I couldn't find any documentation on how to format a txt file with multiple protein accession numbers.. As of right now, I have formatted like this:

AAF45826
AAF48069

The issue is it only blasts the first protein accession number. Does anyone know the proper format so that the program blasts every protein accession number?

Thanks in advance

alignment blast • 2.2k views
ADD COMMENT
1
Entering edit mode
4.9 years ago
vkkodali_ncbi ★ 3.7k

You can use Entrez Direct to download the sequences in fasta format on the fly like this:

blastp -query <(epost -db protein -input <acc_list.txt> | efetch -format fasta) -db <your_db>

The acc_list.txt file should contain valid protein accessions, one per line.

ADD COMMENT

Login before adding your answer.

Traffic: 2442 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6