kraken: unable to download the databases from ncbi
1
2
Entering edit mode
6.2 years ago
karthic ▴ 130

Hi All,

After installing kraken am trying to build the database as specified in the manaul but getting the following messages. Any inputs on this??

/Tools/kraken-master/KRAKEN$ ./kraken-build --standard --threads 40 --db /home/karthic/Databases/KRAKEN
Found jellyfish v1.1.11
Step 1/3: performing rsync dry run...
Rsync dry run complete, removing any non-existent files from manifest.
Step 2/3: Performing rsync file transfer of requested files
rsync: failed to connect to ftp.ncbi.nlm.nih.gov (165.112.9.229): Connection timed out (110)
rsync: failed to connect to ftp.ncbi.nlm.nih.gov (2607:f220:41e:250::7): Network is unreachable (101)
rsync error: error in socket IO (code 10) at clientserver.c(128) [Receiver=3.1.1]
rsync_from_ncbi.pl: rsync error, exited with code 10

Thanks in Advance, KK

RNA-Seq genome next-gen software error Assembly • 7.5k views
ADD COMMENT
0
Entering edit mode

You are probably behind a firewall/proxy and kraken is not able to reach NCBI via rsync. If that is the case you may want to talk with your local sys admins. There are solutions but they will depend on your local setup.

ADD REPLY
0
Entering edit mode

Are you able to download anything from the NCBI ftp server using wget?

ADD REPLY
0
Entering edit mode

yes i could do with wget

ADD REPLY
0
Entering edit mode

Hi,

I was hitting the same rsync error. The way I got around it was to change the rsync_from_ncbi.pl script to use wget instead. I changed line 70 from:

if (system("rsync --no-motd --files-from=manifest.txt rsync://ftp.ncbi.nlm.nih.gov/genomes/ .") != 0) {

to

if (system("wget -nc -nH -x --cut-dirs=1 -i manifest.txt -B ftp://ftp.ncbi.nlm.nih.gov/genomes/ .") != 0) {

It worked okay once I managed to get wget to behave in the the same way as the rsync command. I don't know how it will affect database updates. I was creating a new one when I ran into this error. Good Luck!

ADD REPLY
0
Entering edit mode

Worked for me, thanks!

ADD REPLY
0
Entering edit mode

Thanks, it works for me too. However, if the download was suspended, it will download the existing files wholly, it cannot resume from break point. the "-nc" flag didn't work ?

ADD REPLY
2
Entering edit mode
6.2 years ago
Joseph Hughes ★ 3.0k

Since NCBI updated their FTP website and decided to phase-out Genbank Identifiers (GIs), the default Kraken database update scripts do not work.

My colleague @Sej Modha has written a python script that helps with updating the kraken databases: http://bioinformatics.cvr.ac.uk/blog/update-kraken-databases/

ADD COMMENT
0
Entering edit mode

Good to know. Has this been raised as an issue with kraken developers?

ADD REPLY
0
Entering edit mode

I believe Derrick Wood, kraken developer, has moved on to pastures new.

ADD REPLY
0
Entering edit mode

Hi Joseph,

I tried the script but it is not working. Getting the following error..

/Tools/kraken-master$ python Update_kraken_db.py File "Update_kraken_db.py", line 18 if len(sys.argv) > 1: ^

ADD REPLY
2
Entering edit mode

Hi Karthic,

There is something wrong with the code formatting on the WordPress, code formatting plugin has changed the code on line 18.

Please download the script from the github and try again, let me know if there are any problems.

ADD REPLY
0
Entering edit mode

Hey Sej,

Thank you for the solution. The script is working.

Regards, KK

ADD REPLY
0
Entering edit mode

Hello Sed Modha, I have been using your script but at some point the following error appears:

sys:1: DtypeWarning: Columns (20) have mixed types. Specify dtype option on import or set low_memory=False.
Traceback (most recent call last):
  File "./UpdateKrakenDatabases.py", line 118, in <module>
    get_fasta_in_kraken_format('human_genome.fa')
  File "./UpdateKrakenDatabases.py", line 98, in get_fasta_in_kraken_format
    for seq_record in records:
  File "/aplic/GOOLF/1.6.10/Python/3.3.2/lib/python3.3/site-packages/Bio/SeqIO/__init__.py", line 600, in parse
    for r in i:
  File "/aplic/GOOLF/1.6.10/Python/3.3.2/lib/python3.3/site-packages/Bio/GenBank/Scanner.py", line 478, in parse_records
    record = self.parse(handle, do_features)
  File "/aplic/GOOLF/1.6.10/Python/3.3.2/lib/python3.3/site-packages/Bio/GenBank/Scanner.py", line 462, in parse
    if self.feed(handle, consumer, do_features):
  File "/aplic/GOOLF/1.6.10/Python/3.3.2/lib/python3.3/site-packages/Bio/GenBank/Scanner.py", line 430, in feed
    self._feed_header_lines(consumer, self.parse_header())
  File "/aplic/GOOLF/1.6.10/Python/3.3.2/lib/python3.3/site-packages/Bio/GenBank/Scanner.py", line 1436, in _feed_header_lines
    structured_comment_key = re.search(r"([^#]+){0}$".format(STRUCTURED_COMMENT_START), data).group(1)
AttributeError: 'NoneType' object has no attribute 'group'

Any help?

ADD REPLY
1
Entering edit mode

Hi there, I have updated the script to explicitly specify the dtype, updated version of the script is available to download from the github.

ADD REPLY
0
Entering edit mode

Thank you for the help!

ADD REPLY

Login before adding your answer.

Traffic: 1992 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6