Error running local blastn in R using system2
0
0
Entering edit mode
29 days ago
Harrison • 0

Hi there,

I've been trying to run blastn locally on metabarcoding sequences of eukaryotes and am running into errors both in command line and using system2 in R.

I have downloaded and unzipped the entire nt_euk library from NCBI, resulting in a folder called "nt_euk" containing a bunch of file types (.nhr, .nin, .nnd, .nni, .nog, .nsq) for each chunk (e.g., nt_euk.01, nt_euk.02, etc.) of the reference library database, as well as files for the whole database (nt_euk.nal, nt_euk.ndb, nt_euk. nt_euk.nos, nt_eul.not, nt_euk.ntf, nt_euk.nto). The folder also contains taxonomy files (taxdb.btd, taxdb.bti, taxonomy4blast.sqlite3).

I am trying to implement this in my bioinformatics pipeline in R using system2() to run command line functions from blast+ on unassigned ASVs from my samples. The code looks like this:

blast.f6 <- c('qseqid', 'sseqid', 'sscinames', 'scomnames', 'sskingdoms', 'pident', 'qcovs')
blastn <- "C:/Program Files/NCBI/blast-2.15.0+/bin/blastn.exe"
ntdb <- "data/nt_euk"
input <- "data/results/18S-Comeau_ASV_sequences.fasta"
blast.out <-
      system2(command = blastn,
          args = c('-db', ntdb,
                   '-num_threads', '10', 
                   '-outfmt', sprintf('"6 %s"', paste(collapse = ' ', blast.f6)),
                   '-perc_identity','.99',
                   '-max_target_seqs', '1',
                   '-query', input,
                   '-out', 'data/results/18S_blast.txt'),
          wait = TRUE,
          stdout = TRUE
  )

This results in an error:

Warning message:
In system2(command = blastn, args = c("-db", ntdb, "-num_threads",  :
  running command '"C:/Program Files/NCBI/blast-2.15.0+/bin/blastn.exe"` ... `had status 2

I tried to run blastn in the Terminal directly and I also get an error:

blastn -query data/results/18S-Comeau_ASV_sequences.fasta -db data/nt_euk
BLAST Database error: No alias or index file found for nucleotide database

I thought the alias/index file is nt_euk.nal, which is in my directory. So I'm not sure what exactly is the issue here and all my Google searching has lead me to dead ends. Any insights or solutions would be much appreciated!

blastn NCBI R • 433 views
ADD COMMENT
0
Entering edit mode

What do you see if you cat nt_euk.nal? Do the number of pieces mentioned in that file match with what you locally have?

ADD REPLY
0
Entering edit mode

I am seeing 'cat' is not recognized as an internal or external command, operable program or batch file.

ADD REPLY
0
Entering edit mode

Also, nt_euk.nal is just one file. There are other nt_euk files in my directory with different extensions (.ndb, .nos, .not, .ntf, .nto).

ADD REPLY
0
Entering edit mode

The -db data/nt_euk switch means that in your current directory you have a subdirectory called data, and all the nt_euk files are in that directory. If that's not the case, it will throw this error.

It may be a good idea to give a whole path, e.g. -db C:/data/nt_euk or whatever the actual path is. As always, it is not recommended to have space characters in directory names if it can be avoided.

ADD REPLY
0
Entering edit mode

Yes, in my actual script I use the whole path to nt_euk. I was just showing C:/data/nt_euk as shorthand for the full path. There are no space characters in my directory name.

ADD REPLY

Login before adding your answer.

Traffic: 1682 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6