Method to Check Fastq Completeness after Fastq-dump
2
5
Entering edit mode
7.3 years ago
Shicheng Guo ★ 9.4k

Hi All,

What's your method to check the completeness of the fastq file after the download by fastq-dump from SRA database? I always find some non-completeness fastqs after the fastq-dump.

Thanks.

fastq-dump Completeness • 9.2k views
ADD COMMENT
2
Entering edit mode

You should always check EBI-ENA to see if fastq files are available. For the SRR# you posted below.

ADD REPLY
0
2
Entering edit mode

By the way: how to deal with Resume Broken Download Problem for fastq-dump ?

ADD REPLY
0
Entering edit mode

17 months ago and no answer to thais question, i have the same issue here when dumping big files (~30G) and don't want to restart downloading, how to resume browken download with fast-dump? best

ADD REPLY
1
Entering edit mode

Thanks. The method you mention works in some way. However, for the majority situation, it doesn't work. for example:

fastq-dump --split-files --gzip SRR949203

if you just download the SRA files, I think it is okay to use

 vdb-validate SRR949203
ADD REPLY
2
Entering edit mode
5.4 years ago
ATpoint 82k

Just to update this, it is not recommended to use fastq-dump for downloads. It is slow and prone to connection losses. Better use prefetch together with Aspera, see here, to get the SRA files, and then use fastq-dump to convert to fastq. Still, you can get most data directly from the European Nucleotide Archive in fastq format. Downloading from there is pretty simple and fast, see my tutorial on that: Fast download of FASTQ files and metadata from the European Nucleotide Archive (ENA) . If you have to download from NCBI, e.g. because data are restricted, go with prefetch followed by parallel-fastq-dump, which is a wrapper for parallelizing fastq-dump. After successfully converting a sra to fastq, both tools (fastq-dump/parallel-fastq-dump) print a summary message that only shows up if no errors occurred, so I never felt the need to verify the fastq file after converting from sra, given that message was printed.

ADD COMMENT
0
Entering edit mode

Hi ATpoint, How to apply Aspera in Linux server?

ADD REPLY
1
Entering edit mode
4.0 years ago

You can use fqlint to identify a broad range of issues Illumina-based FASTQ files. If your download happens to be interrupted at the exact boundary between reads, then this will not report an error: it will only report malfored FASTQ files.

To install it, you can do the following after installing Rust.

cargo install --git https://github.com/stjude/fqlib.git
ADD COMMENT

Login before adding your answer.

Traffic: 2580 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6