Blastn - Program Runs Indefinitely When Generating Xml Formatted Output
1
2
Entering edit mode
12.7 years ago
User 6228 ▴ 110

I am running blastn on some nucleotide data, and it seems to run indefinitely when I generate XML output. The jobs take ~15 minutes when generating either the default format or tab delimited, but when I choose XML format each job maxes out the 3 hour cap I have set on it. I find it hard to believe XML generation would increase the job length sixfold so I figure there is a problem somewhere. Has anyone run into this?

I am using BLAST 2.2.24+.

Thanks

EDIT: Here are some example commands:

Working (archive format):
/uaopt/ncbi/2.2.24+/bin/blastn -outfmt 11 -db beij \
-query $home'datasets/main/KCmeta.fna' \
-out '/scr3/bmf/results/reference
alignment/blast/beij_archive.asn'

Not working (xml format):
/uaopt/ncbi/2.2.24+/bin/blastn -outfmt 5 -db beij \
-query $home'datasets/main/KCmeta.fna' \
-out '/scr3/bmf/results/reference
alignment/blast/beij.xml'

Also working are the default format, tab delimited, tab delimited w/ comments, & CSV.

I realized I have access to 2.2.24+ but the results are the same, I'd prefer not to need 2.2.25 since this is in a high performance computing lab where I have to request that it be installed.

nucleotide blast blast • 3.0k views
ADD COMMENT
3
Entering edit mode

Please post one exact command that works and one that does not.

ADD REPLY
0
Entering edit mode

Just a thought: you're not running out of disk space?

ADD REPLY
0
Entering edit mode

Using the XML format usually generates a lot of data, how many sequences are your running against which database? Like Michael asked, please show us the parameters you've used.

ADD REPLY
0
Entering edit mode

Please try again with the latest version 2.2.25.

ADD REPLY
1
Entering edit mode
12.2 years ago
Hamish ★ 3.2k

Well I can't replicate the problem so this is going to be a bit of a stab in the dark...

When I've had problems with the generation of the NCBI BLAST XML output in the past the problem has been an issue with the database. Some things for you to check:

  1. That the sequence identifiers are unique in the database.
  2. The BLAST database was created with the identifiers indexed, i.e. for fasta sequence format input use formatdb with '-oT' or makeblastdb with '-parse_seqids'. Note: BLAST uses case insensitive indexing for the identifiers so be careful to catch identifiers that vary only in case in step 1.

Beyond that it sounds like you might have found a bug, so contacting the NCBI BLAST help-desk may be the only way forward (see BLAST help). If you do find an explanation for this behavior be sure to post an answer, so we all know what to look for in future.

ADD COMMENT

Login before adding your answer.

Traffic: 2987 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6