Biostar Beta. Not for public use.
Downloading paired end fastq from SRA
1
Entering edit mode
13 months ago
t.t • 10

Hi everyone,

I would really like to download the raw data of a specific public single-cell RNA-Seq experiment (ENA, GEO). As the BCL files do not seem to be available the most "raw" format would probably be paired end fastq files. Currently I am unable to download the files in a split way and I would really appreciate your help.

For simplicity just focus on one sample: Donor1_scRNA-seq_rep1 (GSM3052917, Experiment: SRX3815586, Run: SRR6860519)

I already tried fastq-dump and fasterq-dump with all possible split parameters (--split-files etc.) but despite of the parameter I just receive one fastq file.

fastq-dump --split-files SRR6860519
fasterq-dump -S SRR6860519

The library type is definitely paired and at ENA one can see two submitted MD5-sums per sample.

Does anyone know how to split these samples correctly? And does it make a difference if I provide the experiment accession or the run accession to fastq-dump/fasterq-dump?

Thanks in advance!

ADD COMMENTlink
1
Entering edit mode

Although the sample was described as paired-end, I am sure the sample only contains one read, and there was a note - "This run has 1 read per spot", please click here: https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR6860519

ADD REPLYlink
1
Entering edit mode

Yes it does. Not the first time there is something missing on NCBI. Contacting the authors is probably your best choice.

ADD REPLYlink
1
Entering edit mode

I think the authors only uploaded the R2 fastq files, and not the R1 file containing the UMI sequence. Here you can read in Extraction protocol and Data processing that R1 is 26 nt and R2 is 100 nt long. If you look in the fastq file, you see only 100 (101) nt long reads. If you want the UMI as well, I am afraid you'll have to ask the authors (as ATpoint is suggesting).

ADD REPLYlink
0
Entering edit mode

Thanks for pointing that out.

What I am still curious about are the two MD5 checksums that are available per sample (at ENA). Wouldn't that mean that the authors indeed uploaded two files per sample?

Edit: Found the answer myself for the two checksums: At ENA there were two files submitted per sample: A BAM-file and an related index (.BAI).

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1