How to run fastqc for 500 raw sequence reads (100 single end and 400 paired end reads) in linux terminal and take the results to multiqc?
1
0
Entering edit mode
2.1 years ago
arr234 ▴ 40

I am new to linux OS. As I have to do fastqc for 500 fastq files, I thought I will use linux to batch run and to pipeline the results to multiqc. Kindly help me with the linux commands to do the same. Thanks in advance.

multiqc fastqc • 1.0k views
ADD COMMENT
0
Entering edit mode

A few points/queries regarding your question:

When you say you have 500 raw sequence reads, do you mean you have 500 fastq samples, or one sample with 500 reads?

If it is the later, you don't need multiqc. Multiqc is specifically used when you have multiple samples and you need to collate multiple sequencing data reports into one combined report.

ADD REPLY
0
Entering edit mode

Sorry I meant I have 500 fastq files

ADD REPLY
1
Entering edit mode
2.1 years ago
Phil Ewels ★ 1.4k

There are a lot of answers to this question - the best will depend on the details of your exact situation (single workstation or cluster? Doing it once or doing it routinely?). The simplest that I can think of is to just run the jobs in bash. So for example:

# Run FastQC on all the FastQ files
for f in *.fastq.gz;
do
    fastqc $f
done

# Run MultiQC on the results
multiqc .

This is simple but it will be fairly slow, as it runs FastQC on one file at a time. Depending on the size of your compute setup, it may be better to run a dedicated pipeline tool such as Nextflow or Snakemake. I work with Nextflow and the nf-core community, and the template pipeline that we base new pipelines on does basically what you're asking for, so you could even use that: nf-core create to make a new pipeline.

ADD COMMENT

Login before adding your answer.

Traffic: 2084 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6