Why is FastQC not working after using Trim galore?
2
4
Entering edit mode
7.7 years ago
beausoleilmo ▴ 580

I have a FASTQ file and I'm able to run the FASTQC program to analyse the file. but when I use trim_galore, FASTQC (or the FASTQC option in trim_galore) is not working anymore.

$ fastqc ./sub1_val_1.fq.gz

This is the output:

Started analysis of sub1_val_1.fq.gz
Analysis complete for sub1_val_1.fq.gz
Failed to process file sub1_val_1.fq.gz
java.lang.ArrayIndexOutOfBoundsException: -1
    at uk.ac.babraham.FastQC.Modules.SequenceLengthDistribution.calculateDistribution(SequenceLengthDistribution.java:100)
    at uk.ac.babraham.FastQC.Modules.SequenceLengthDistribution.raisesError(SequenceLengthDistribution.java:184)
    at uk.ac.babraham.FastQC.Report.HTMLReportArchive.startDocument(HTMLReportArchive.java:336)
    at uk.ac.babraham.FastQC.Report.HTMLReportArchive.<init>(HTMLReportArchive.java:84)
    at uk.ac.babraham.FastQC.Analysis.OfflineRunner.analysisComplete(OfflineRunner.java:155)
    at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:110)
    at java.lang.Thread.run(Thread.java:695)

Is the Failed to process file an error because the version is not correct between trim_galore and FastQC?

I found this, but that wasn't that helpful.

I'm using FastQC v0.11.5 and trim_galore v0.4.1.

I subsetted a library (reads in paired-end) using this:

seqtk sample -s100 ./SRR2937435_1.fastq.gz 10000 | gzip  > sub1.fastq.gz
seqtk sample -s100 ./SRR2937435_2.fastq.gz 10000 | gzip > sub2.fastq.gz

The sub1_val_1.fq.gz file was after passing sub1.fastq.gz into trim_galore. FastQC with sub1.fastq.gz is working.

fastq FASTQC Fastqc • 7.5k views
ADD COMMENT
0
Entering edit mode

Is trim galore generating an error during run or is it completing without any errors?

ADD REPLY
0
Entering edit mode

$ trim_galore --illumina --paired sub1.fastq.gz sub2.fastq.gz

No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)
Path to Cutadapt set as: 'cutadapt' (default) 1.10
Cutadapt seems to be working fine (tested command 'cutadapt --version')
Writing report to 'sub1.fastq.gz_trimming_report.txt'

SUMMARISING RUN PARAMETERS
==========================
Input filename: sub1.fastq.gz
Trimming mode: paired-end
Trim Galore version: 0.4.1
Cutadapt version: 1.10
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; user defined)
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp
Output file(s) will be GZIP compressed

Writing final adapter and quality trimmed output to sub1_trimmed.fq.gz

  >>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file sub1.fastq.gz <<< 
This is cutadapt 1.10 with Python 2.7.10
Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC sub1.fastq.gz
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
Finished in 0.18 s (18 us/read; 3.28 M reads/minute).

=== Summary ===
Total reads processed:                  10,000
Reads with adapters:                     8,288 (82.9%)
Reads written (passing filters):        10,000 (100.0%)
Total basepairs processed:       940,000 bp
Quality-trimmed:                   2,658 bp (0.3%)
Total written (filtered):        680,222 bp (72.4%)

=== Adapter 1 ===
Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 8288 times.

No. of allowed errors:
0-9 bp: 0; 10-13 bp: 1
Bases preceding removed adapters:
  A: 76.7%
  C: 18.0%
  G: 1.9%
  T: 3.4%
  none/other: 0.0%

Overview of removed sequences
length  count   expect  max.err error counts
1   819 2500.0  0   819
...blablabla...    
71  2   0.0 1   2

RUN STATISTICS FOR INPUT FILE: sub1.fastq.gz
10000 sequences processed in total
The length threshold of paired-end sequences gets evaluated later on (in the validation step)

Writing report to 'sub2.fastq.gz_trimming_report.txt'

SUMMARISING RUN PARAMETERS
Input filename: sub2.fastq.gz
...     
Writing final adapter and quality trimmed output to sub2_trimmed.fq.gz

  >>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file sub2.fastq.gz <<< 
This is cutadapt 1.10 with Python 2.7.10
Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC sub2.fastq.gz
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
Finished in 0.17 s (17 us/read; 3.45 M reads/minute).

=== Summary ===
Total reads processed:                  10,000
Reads with adapters:                     8,302 (83.0%)
Reads written (passing filters):        10,000 (100.0%)
Total basepairs processed:       940,000 bp
Quality-trimmed:                   1,001 bp (0.1%)
Total written (filtered):        682,905 bp (72.6%)
=== Adapter 1 ===
Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 8302 times.
No. of allowed errors:
0-9 bp: 0; 10-13 bp: 1

Bases preceding removed adapters:
  A: 4.3%
  C: 3.3%
  G: 26.3%
  T: 66.1%
  none/other: 0.0%

Overview of removed sequences
length  count   expect  max.err error counts
1   796 2500.0  0   796
...blablabla...
69  1   0.0 1   1

RUN STATISTICS FOR INPUT FILE: sub2.fastq.gz
10000 sequences processed in total
The length threshold of paired-end sequences gets evaluated later on (in the validation step)

Validate paired-end files sub1_trimmed.fq.gz and sub2_trimmed.fq.gz
file_1: sub1_trimmed.fq.gz, file_2: sub2_trimmed.fq.gz

>>>>> Now validing the length of the 2 paired-end infiles: sub1_trimmed.fq.gz and sub2_trimmed.fq.gz <<<<<
zcat: can't stat: sub1_trimmed.fq.gz (sub1_trimmed.fq.gz.Z): No such file or directory
zcat: can't stat: sub2_trimmed.fq.gz (sub2_trimmed.fq.gz.Z): No such file or directory
Writing validated paired-end read 1 reads to sub1_val_1.fq.gz
Writing validated paired-end read 2 reads to sub2_val_2.fq.gz
Total number of sequences analysed: 0
Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 0 (N/A%)
Deleting both intermediate output files sub1_trimmed.fq.gz and sub2_trimmed.fq.gz
ADD REPLY
0
Entering edit mode

I admit that I have difficulty seeing if there is an error message. There are places where it seems that it's not finding the working directory...

ADD REPLY
0
Entering edit mode

Looks like it has trouble writing in that directory. It might be a permissions issue. Try running it in a new clean directory.

ADD REPLY
0
Entering edit mode

Do you mean that the file should be executable? Should I chmod it to 777?

ADD REPLY
0
Entering edit mode

Set a different output directory with -o output_dir option.

ADD REPLY
0
Entering edit mode

That's a good option too.

The reason I suggested creating a new directory is it will definitely exist, it will definitely be empty, and it will probably be readable and writeable.

ADD REPLY
0
Entering edit mode

The file is already executable since you were able to execute it. I am worried about the other files and directories involved.

ADD REPLY
0
Entering edit mode

Can you check that the fastq looks reasonable:

zcat sub1_val_1.fq.gz | head

You can compare to the working file if you are not sure what to expect. It should not be too different.

ADD REPLY
0
Entering edit mode

I get an error. It's saying something about the working directory, but I double checked with ls, and it's really in the same directory.

zcat sub1_val_1.fq.gz | head
zcat: can't stat: sub1_val_1.fq.gz (sub1_val_1.fq.gz.Z): No such file or directory

ls
   sub1.fastq.gz_trimming_report.txt sub2.fastq.gz                     sub2_val_2.fq.gz
   sub1.fastq.gz                     sub1_val_1.fq.gz                  sub2.fastq.gz_trimming_report.txt
ADD REPLY
1
Entering edit mode

Pay close attention to the error message:

zcat: can't stat: sub1_val_1.fq.gz (sub1_val_1.fq.gz.Z): No such file or directory

Look at the .fq.gz.Z file extension. I think your problem is the same as the one described here and here.

ADD REPLY
0
Entering edit mode

Looks like the file is empty (only 20 bytes, and the report is like 3 KB).

ADD REPLY
0
Entering edit mode

For reference, link to SO post: http://stackoverflow.com/questions/38706402

ADD REPLY
5
Entering edit mode
7.0 years ago
h.mon 35k

Update you TrimGalore to at least 0.4.2:

07-09-16: Version 0.4.2 released

  • Replaced zcat with gunzip -c so that older versions of Mac OSX do not append a .Z to the end of the file and subsequently fail because the file is not present. Dah...
ADD COMMENT
1
Entering edit mode
7.7 years ago
beausoleilmo ▴ 580

I found the answer: You have to uncompress it. Probably, trim_galore is only working with tar.gz and not fastq.gz.

gzip -d -k sub1.fastq.gz > sub1.fastq
y # to accept to overwrite
gzip -d -k sub2.fastq.gz > sub2.fastq
y # to accept to overwrite

trim_galore  --illumina --paired --fastqc sub1.fastq sub2.fastq
ADD COMMENT
1
Entering edit mode

It sounds odd that fastq.gz is not accepted since the software page clearly says

Trim Galore! accepts and produces standard or gzip compressed FastQ files

But long as you were able to make it work :-)

ADD REPLY
1
Entering edit mode

You don't have to uncompress it. I used it many times with compressed files.

I still think it's a permissions-related issue. When you pipe to a file, it should not ask to overwrite. The file should be silently overwritten.

ADD REPLY

Login before adding your answer.

Traffic: 1994 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6