Question

Process Truncated fastq file

0

Entering edit mode

4 months ago

waqaskhokhar999 ▴ 160

Dear all, I have 150bp paired-end mRNA data, for one sample in the reverse reads (R2) file the QC (FastQC) run for upto 95 % and then failed with an error message:

Failed to process file Sample1-mRNA_R2.fastq.gz
uk.ac.babraham.FastQC.Sequence.SequenceFormatException: Ran out of data in the middle of a fastq entry.  Your file is probably truncated

I have tried to print the tail of the file using the following command:

zcat ISample1-mRNA_R2.fastq.gz | tail -1

and got the following output:

gzip: Sample1-mRNA_R2.fastq.gz: unexpected end of file

AGGCGTATCTCACTGACTTCCTGTGTCAGTTTGCACAGCAGCCCTGCTATGCCATGTTTTCAGACCATCTCAATGAGAATGAAAAGCGAGTGCTGCAGGCCATTGGCAT

The file seems to be truncated but we do not have any other source available as the sequencing was done in 2017 and we only have this version of the file available.

Is there a way to process the truncated fastq file for the differential gene expression analysis?

fastqc fastq • 320 views

ADD COMMENT • link updated 4 months ago by GenoMax 141k • written 4 months ago by waqaskhokhar999 ▴ 160

score 1 · Answer 1 · 2023-12-04

Is there a way to process the truncated fastq file?

Once data is compromised in some way you can't be totally sure of the results. That said you could use repair.sh from BBMap to remove singleton reads and bring the two files in sync. You will lose some data but the remainder can be used.

repair.sh -Xmx4g \
  in1=R1.fastq.gz \
  in2=R2.fastq.gz \
  out1=R1.repaired.fastq.gz \
  out2=R2.repaired.fastq.gz \
  outs=singletons.fastq.gz \
  repair