This is a beta test.
Question: FastQC with multiple FASTQ files
0
Entering edit mode

I have received 384 fastq.gz files. These come from paired-end sequencing so I have 2 files per patient so 192 patients. I am new to NGS data analysis and I wish to start using FastQC. What would be the best way to proceed?

  • I know FastQC can be run graphically but presumably, with that many samples, it would be best to use the command line..
  • I read some places that merging all samples into a single (or 2 with paired-end) files might be the solution. Is that recommended? Or should I just use simple bash scripting in like below (or something similar)?

for i in *fastqc.gz do bsub < fastqc_script_with_commands.sh done

I guess I'm just curious if there is a convention of merging fastq files or keeping them separate (1 or 2 per sample).

Thanks

ADD COMMENTlink 17 months ago m93 • 150 • updated 17 months ago drkennetz • 370
Entering edit mode
1

use gnu-parallel or snakemake.

ADD REPLYlink 17 months ago
cpad0112
11k
Entering edit mode
0

or Nextflow. Examples of using FastQC inside a Nextflow pipeline here, here and here

ADD REPLYlink 17 months ago
steve
♦ 2.0k
3
Entering edit mode

You would want to do the QC for files individually. When run on the command line with -o option FastQC will write the result files to that directory. A bash loop would work. You can look into MultiQC to aggregate all results.

ADD COMMENTlink 17 months ago genomax 68k
Entering edit mode
0

+1 for MultiQC, I dont even bother to look at the individual output metrics anymore

ADD REPLYlink 17 months ago
steve
♦ 2.0k
1
Entering edit mode

fastqc.sh:

#!/usr/bin/env bash
RUN_PATH=$1
cd $RUN_PATH
for file in $(ls $RUN_PATH)
do
    SAMPLE=`basename $file`
    fastqc -t 5 ${SAMPLE} -o /path/to/where/you/want/outputs
done

$./fastqc.sh /path/to/fastqs/

If you are running this on a cluster, just add qsub or bsub before the fastqs line (we have ibm):

That line would become:

bsub -P project -q queuename -n 1 -R "rusage[mem=2000]" fastqc -t 5 ${SAMPLE} -o /path/to/where/you/want/outputs

change the queuename to the actual name of the queue you submit jobs to.

ADD COMMENTlink 17 months ago drkennetz • 370

Login before adding your answer.

Powered by the version 1.6