Paired-end SRA experiment, two samples come out as single-end
1
0
Entering edit mode
6.2 years ago
xarielle • 0

Hi, I am downloading raw RNA-seq data from SRA using fastq-dump from the SRA toolkit. I am using the --split-3 option, so that when it is single-end, I get a single fastq file per sample, and two fastq files if paired-end. It seems to be working fine, except that for a few runs from paired-end experiments, I am getting a single fastq file instead of two. An example is for the GSE75440 dataset, for sample GFP rep2 and GFP rep3 (SRR2969254 and SRR2969255). What could explain this behaviour? Thank you.

EDIT: One thing I should add is that when running fastq-dump, I get the following error:

Rejected 52180955 READS because of filtering out non-biological READS
Read 52180955 spots for SRR2969254.sra
Written 52180955 spots for SRR2969254.sra

Is the single fastq file produced usable?

RNA-Seq SRA fastq-dump • 4.9k views
ADD COMMENT
1
Entering edit mode

When possible get fastq files directly from EBI-ENA.

Even though SRR2969254 and SRR2969255 are marked as PE there appears to be only one read in ENA as well. So there could be something wrong with these two submissions.

ADD REPLY
0
Entering edit mode

I am getting the same error. Did you find the solution?

ADD REPLY
2
Entering edit mode
6.2 years ago

I would use the --split-files option instead, see the fastq-dump help page

fastq-dump -h

among the many options

...
--split-files       Dump each read into separate file.Files 
                    will receive suffix corresponding to read 
                    number 
...

If it comes out as single end it means it is mislabeled. The two may be the same though I don't like the --split-3 options as it seems like a mislabeling of sorts. Most data are not in three files.

ADD COMMENT
0
Entering edit mode

I have seen that page, and I am aware of the --split-files option. --split-3 is what I actually want to use since it automatically splits the reads into two fastq files, denoted with "_1" and "_2" if paired-end, and outputs a single fastq file if single-end. It usually works perfectly for me. The exception is the specific samples that I have mentioned. I want to know what is the problem with these samples. From what you are saying, these specific samples would be mislabeled and in fact be paired-end, inside an experiment where all the other samples are paired-end?

ADD REPLY
0
Entering edit mode

Indeed it seems I have slightly misread your original post.

I think if it comes out as single-end when it is supposed to be paired-end then it might be an issue of incorrect data entry. I would also check the SRA browser as well for these datasets.

ADD REPLY
0
Entering edit mode

Yes I suppose it could be incorrect data entry, although I am not sure. Here is the link to one of these samples on the SRA browser: https://www.ncbi.nlm.nih.gov/sra/?term=SRR2969254. I am not sure how to tell what is the issue from this page.

ADD REPLY
0
Entering edit mode

Also please see the edit to my original post, could this error explain why there is only one file produced?

ADD REPLY
1
Entering edit mode

The SRA browser for SRR2969254 shows single end reads (in the read navigation) but the run is, as you stated, labeled as PAIRED.

It might be a data entry error.

ADD REPLY
0
Entering edit mode

I think a difference beetween --split-files and --split-3 is also that when using --split-files, if single-end the file will be named accession_1.fastq, whereas it will be labeled accession.fastq if using --split-3, which I find more appropriate, although that is not that relevant.

ADD REPLY

Login before adding your answer.

Traffic: 1506 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6