fastq dump error
2
1
Entering edit mode
7.4 years ago
Satyajeet Khare ★ 1.6k

Hi Biostars

I am trying to convert sra from PRJNA282735 dataset to fastq and I am getting following error...

fastq-dump.2.1.7 fatal: SIGNAL - Segmentation fault

My fastq-dump command is

fastq-dump --split-3 SRR2016445.sra -O SRR2016445

I am not able to find similar error elsewhere. The ENA page for some samples of this dataset has three files per SRX experiment (e.g. SRR2016445.fastq, SRR2016445_1.fastq and SRR2016445_2.fastq).

This is unusual for me as I usually get one or two SRR runs per experiment (depending on single end paired end) but never 3. I am wondering if this is the reason for getting errors.

Anybody with similar experience?

sra fastq-dump sratoolkit • 10k views
ADD COMMENT
1
Entering edit mode

Are you using the latest sratoolkit? NCBI has moved to HTTPS only connections. I am getting two files dumping with (v. 2.8) fastq-dump --split-3 SRR2016445

ADD REPLY
2
Entering edit mode
7.4 years ago
Michael 54k

I think 2.1. is fairly old, with a slightly newer version 2.4.2 I get: fastq-dump.2.4.2 err: error unexpected while resolving tree within virtual file system module - failed to resolve accession 'SRR2016445' - Obsolete software. See https://github.com/ncbi/sra-tools/wiki ( 406 )

The latest release on github is 2.8, you should get and install the latest version as outlined here: https://github.com/ncbi/sra-tools/wiki

ADD COMMENT
0
Entering edit mode

@Mike, that was it! I see three fastq files for SRR2016445 as expected using fastq-dump2.8.

@genomax2, download was not an issue. I used wget FTP to download the sra file. The conversion was. Your suggestion was right though.

ADD REPLY
4
Entering edit mode
7.4 years ago
piet ★ 1.8k

Please note that most SRA files are not self contained, they depend on a reference sequence which is a separate download. Thus it is not enough to download the SRA file with wget. 'fastq-dump' will try to download the reference sequence behind the scenes before it extracts any reads. The reference sequence for SRR2016445 is https://www.ncbi.nlm.nih.gov/nuccore/149361431.

ADD COMMENT
0
Entering edit mode

Thanks a lot! I tried following command...

prefetch SRR2016445

A reference file got downloaded in ~/public/refseq/ folder and SRR file in ~/public/sra/ folder. I could split SRR file into three fastq files using fastq-dump2.8.0 command. I guess the small fastq file without '_1' or '_2' extension comprises of unpaired reads.

For some reason, I am not able to convert the reference file 'NC_000072' from binary to fasta using fastq-dump.

P.S. fastq-dump does not work very well for download. It downloads both SRR file and reference file just like prefetch command, but the files retain .cache extension, which I believe is an indication of incomplete download.

ADD REPLY
0
Entering edit mode

Ok, so vdb-dump was of help there. Here is the command.

vdb-dump.2.8.0 -f fasta1 --output-file NC_000072.5.fa NC_000072.5
ADD REPLY
0
Entering edit mode

I noticed some time ago that the .cache files would always placed in your home by fastq-dump, while you download to a possibly much larger partition. This can easily fill up your home and it won't remove the files. I would therefore try to run fastq-dump like this HOME=./ fastq-dump --split-3 SRR2016445

ADD REPLY
0
Entering edit mode

No luck. Still get .cache files. Will try and figure out whats going wrong.

ADD REPLY
0
Entering edit mode

Running vdb-config -i allows one to choose directories that will be used by SRAtoolkit. This needs to be done once and will require X-windows (if run with -i). If you want a pure text version run vdb-config -i --interactive-mode textual.

ADD REPLY
0
Entering edit mode

@genomax2,

Configuration looks fine. Default path is ncbi/public. There is no proxy and rest of the settings are default. So why fastq-dump didn't download the SRR file properly (without .cache extension) and why reference file was not converted to fasta is a mystery to me. For now, I can survive with this three step process (prefetch/fastq-dump/vdb-dump).

P.S. fastq-dump works fine for other datasets which don't have refseq file.

ADD REPLY
0
Entering edit mode

Brave people will just edit '~/.ncbi/user-settings.mkfg' with their favorite text editor. Having ' vdb-config' to modifying a simple config file is over engineered.

ADD REPLY

Login before adding your answer.

Traffic: 2409 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6