Question

How to run fastqc for 500 raw sequence reads (100 single end and 400 paired end reads) in linux terminal and take the results to multiqc?

0

Entering edit mode

2.1 years ago

arr234 ▴ 40

I am new to linux OS. As I have to do fastqc for 500 fastq files, I thought I will use linux to batch run and to pipeline the results to multiqc. Kindly help me with the linux commands to do the same. Thanks in advance.

multiqc fastqc • 1.0k views

ADD COMMENT • link updated 2.1 years ago by GenoMax 141k • written 2.1 years ago by arr234 ▴ 40

0

Entering edit mode

A few points/queries regarding your question:

When you say you have 500 raw sequence reads, do you mean you have 500 fastq samples, or one sample with 500 reads?

If it is the later, you don't need multiqc. Multiqc is specifically used when you have multiple samples and you need to collate multiple sequencing data reports into one combined report.

ADD REPLY • link 2.1 years ago by prasundutta87 ▴ 660

0

Entering edit mode

Sorry I meant I have 500 fastq files

ADD REPLY • link 2.1 years ago by arr234 ▴ 40

score 1 · Answer 1 · 2022-03-29

There are a lot of answers to this question - the best will depend on the details of your exact situation (single workstation or cluster? Doing it once or doing it routinely?). The simplest that I can think of is to just run the jobs in bash. So for example:

# Run FastQC on all the FastQ files
for f in *.fastq.gz;
do
    fastqc $f
done

# Run MultiQC on the results
multiqc .

This is simple but it will be fairly slow, as it runs FastQC on one file at a time. Depending on the size of your compute setup, it may be better to run a dedicated pipeline tool such as Nextflow or Snakemake. I work with Nextflow and the nf-core community, and the template pipeline that we base new pipelines on does basically what you're asking for, so you could even use that: nf-core create to make a new pipeline.