Downloading full BlastKOALA results?
1
0
Entering edit mode
5.6 years ago
EduardoFox ▴ 10

I have just started using BlastKOALA KEGG which has been useful in annotating (aminoacid) sequences. This is their website: https://www.kegg.jp/blastkoala/

When you get results, there are links for downloading. However these links will not download all detailed query search results, but just general notes already on the screen. To get these results I need to manually click on each query result on the page, which becomes impracticable with >500 entries. Thus I think need is a tool to download all linked contents from a webpage. I have been trying 'wget' however it doesn't work. It says 'Requested Job Not Found' whatever I do.

Please, did anyone every try to achieve this? Thanks in advance.

blast blastkoala wget annotation KO • 2.1k views
ADD COMMENT
1
Entering edit mode
5.6 years ago
lelle ▴ 830

I had quick look at this on my blastKOALA Result.

When I click on one of my queries I get a detailed list of matches. The list has an URL like this:

https://www.kegg.jp/kegg-bin/blastkoala_result_gene_list?id=39732d974cf46cbc344f96d5d7e81bb69c18dcea&passwd=x3XXyz&type=blastkoala&code=user&target=g1%2Et1

If I run

wget "https://www.kegg.jp/kegg-bin/blastkoala_result_gene_list?id=39732d974cf46cbc344f96d5d7e81bb69c18dcea&passwd=x3XXyz&type=blastkoala&code=user&target=g1%2Et1" -O g1.t1_hits.html

I get a file called g1.t1_hits.html (because of the -O option).

If I change the last parameter of the URL (target=g1%2Et1) to a different protein name I get the result of the according protein.

Maybe you are missing the quotation marks in your wget command?

ADD COMMENT
0
Entering edit mode

Thanks for testing the download ! However you will see that the downloaded page is just what already shows in the screen, which I could easily get by selecting all and pasting to a text editor. I'd like to download the detailed results for each queried protein which you can only see by directly clicking on it. In other words, I'd like to download all HTML pages linked to the page you just downloaded. Please, would you know how to set this in wget? I cannot get all links. Thanks!

ADD REPLY
1
Entering edit mode

the way I would do this is by writing a bash script that calls wget with each protein ID. Something like this:

while read PROT; do
  echo "$PROT"
  wget "https://www.kegg.jp/kegg-bin/blastkoala_result_gene_list?id=39732d974cf46cbc344f96d5d7e81bb69c18dcea&passwd=XXxxXX&type=blastkoala&code=user&target=${PROT}" -O ${PROT}_koala.html
done 

Where prot.txt is a file with one protein ID per line

ADD REPLY

Login before adding your answer.

Traffic: 1593 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6