HHpred batch submission
1
2
Entering edit mode
6.8 years ago
dhalaa1 ▴ 20

Hi, I was wondering if anyone knew of a way to do a batch submission with HHpred? Currently I am able to paste each sequence individually and that can be pretty time consuming. If anyone has any tips or ideas how to submit the sequences as a batch I would appreciate any input. Thank you

Aziza

HHpred batch fasta sequence • 5.1k views
ADD COMMENT
6
Entering edit mode
6.8 years ago
Joe 21k

I run hundreds of sequences using HHpred on the commandline. I have a few scripts for tabulating the output data if you're interested.

I don't know if commandline work is an option for you.

Ultimately a local install is always the best option.

EDIT:

To install, follow their instructions here. The tricky bit is making sure you set the environment variables correctly, but they provide all the instructions you should need.

Once you've done that, download the latest databases (note, they're very big and will take up a lot of space and take a long time to download).

wget robots=off -r --no-parent -nH -nd -np -R *.html,*.txt -A .tgz http://wwwuser.gwdg.de/~compbiol/data/hhsuite/databases/hhsuite_dbs/

This will download all of their databases. If you just want, say, PDB, try: wget http://wwwuser.gwdg.de/~compbiol/data/hhsuite/databases/hhsuite_dbs/pdb70_14Sep16.tgz

Extract the database somewhere memorable

tar xvzf pdb_70_14Sep16.tgz

You're then ready to actually start searching, so you can do something like what's below. These options are specific to how I use the software though, so read their manual and determine what parameters for output you want.

for file in ./*.faa ; do
        hhsearch -dbstrlen 50 -B 1 -b 1 -p 60 -Z 1 -E 1E-03 -nocons -nopred -nodssp -cpu 10 -i $file -d /path/to/database/pdb70_hhm.ffdata
done

You can replace this with a parallelisation call if you like using something like GNU parallels, but just be aware that you can run multiple sequences, but you can also specify that each sequence be searched against other HMMs with multiple cores, so don't over do it.

This will generate you a result file for each sequence that will look something like:

Query         PAK_01787 PAK_01787 T4-like virus tail tube protein gp19 1997281:1997730 forward MW:16801
Match_columns 149
No_of_seqs    1 out of 1
Neff          1.0
Searched_HMMs 25094
Date          Thu Dec  3 14:08:12 2015
Command       hhsearch -cpu 10 -i /home/wms_joe/PVCs/PVC_operons/all/PAK_01787.fsa -d /home/wms_joe/Applications/HHSuite/databases/pdb70/pdb70_hhm.ffdata -B 10 -Z 10 -E 1E-03

 No Hit                             Prob E-value P-value  Score    SS Cols Query HMM  Template HMM
  1 1tvs_A Transactivator protein;  33.0      11 0.00046   25.9   0.0   14   27-42     38-51  (75)
  2 3jqo_B TRAO protein; helical o  31.3      13 0.00051   25.6   0.0   32   44-77     41-72  (135)
  3 2n01_B VIRB9 protein; T4SS, li  30.4      14 0.00054   24.1   0.0   25   42-66     32-60  (106)
  4 1p65_A Nucleocapsid protein; v  20.6      27  0.0011   23.9   0.0   18   17-34     13-30  (73)
  5 2wj5_A Heat shock protein beta  20.4      28  0.0011   20.3   0.0   31   80-113    13-43  (101)
  6 2ltk_A Mono-cysteine glutaredo  18.4      32  0.0013   19.4   0.0   56   35-90     42-100 (110)
  7 1lrw_B Methanol dehydrogenase   16.1      40  0.0016   23.5   0.0   33  109-141    12-46  (83)
  8 1vyb_A ORF2 contains A reverse  15.6      42  0.0017   20.0   0.0   12  106-117     9-20  (238)
  9 1kaf_A Transcription regulator  14.0      49   0.002   23.3   0.0   27   85-111    14-40  (108)
 10 3but_A Uncharacterized protein  13.0      55  0.0022   21.4   0.0   39   88-126     2-40  (136)

No 1
>1tvs_A Transactivator protein; transcription regulation; NMR {Equine infectious anemia virus} SCOP: j.40.1.1 PDB: 1tvt_A
Probab=32.99  E-value=11  Score=25.86  Aligned_cols=14  Identities=43%  Similarity=0.833  Sum_probs=12.3

Q PAK_01787        27 QMCFQSVSGLDISYDT   42 (149)
Q Consensus        27 qmcfqsvsgldisydt   42 (149)
                      |.||..  ||-|||..
T Consensus        38 qlCFlk--GLGIsYg~   51 (75)
T 1tvs_A           38 QLCFLR--SLGIDYLD   51 (75)
T ss_pred             HHHhcc--CCcccccC
Confidence            789999  99999973
...

At which point you can use/modify my script below, which will turn the output file in to a tab delimited file (I then concatenate all my results together so they can be viewed in spreadsheets etc. The script as it is will only return the best hit (No 1) so if you want more than that you'll have to get creative!

ADD COMMENT
0
Entering edit mode

Yes jrj.healey please! I'm am very knew to this but I knew there had to be someway I just could not figure out how. What is commandline and how do I use it? Thank you

ADD REPLY
0
Entering edit mode

I've edited my answer with as much information as I can remember (it's been a while since I installed it!)

ADD REPLY

Login before adding your answer.

Traffic: 2530 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6