Biostar Beta. Not for public use.
Using trimmomatic on multiple single-end read files
0
Entering edit mode
3.7 years ago

I need help to write a for loop to run Trimmomatic tool for quality trimming of single-end fastq files. I need to write a for loop so that I can run an executable for all multiple files. I read the exchanges of a similar question for the paired-end data. But it does not help me much. Any help please! Thanks!

ADD COMMENTlink
1
Entering edit mode
18 months ago
China

It's much simpler than PE ends files:

Shell:

for file in *.fq.gz; do
    # do something with the file
    echo $file
done

GNU Parallel, Gnu Parallel - Parallelize Serial Command Line Programs Without Changing Them

ls *.fq.gz | parallel 'echo {}'
ADD COMMENTlink
0
Entering edit mode
20 months ago
st.ph.n ♦ 2.5k
Philadelphia, PA

Make a bash script with your trimmomatic command:

#!/usr/bin/bash

java -jar trimmomatic-0.35.jar SE -phred33 $1 "`basename $1 .fastq.gz`.trimmomatic_out.fastq.gz" ILLUMINACLIP:TruSeq3-SE:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

where $1 is your input file, and basename will remove the .fastq.gz and replace with the suffix .trimmomatic_out.fastq.gz. Save as run_all_trim.sh.

List all of your single end files in a a file as a list (single column): SE_files.txt. If they are all in one dir: ls -1 *.fastq.gz > SE_files.txt. Then pass each of your single end files to the trimmomatic command.

cat SE_files.txt | xargs -n 1 bash run_all_trim.sh

If you have a lot of files, and don't want it to hangup, and to run in the background:

cat SE_files.txt | xargs -n 1 nohup bash run_all_trim.sh &

htop or top to check periodically that it's still running.

ADD COMMENTlink
0
Entering edit mode

Hi, your comment is getting old but was very useful. However, could you explain how it works? I don't get it.

I wrote this:

trimmomatic SE -threads 16 -phred33 $1 “/trimmomatic/`basename $1 .fastq.gz`.trimmomatic_out.fastq.gz" \

It uses as input my files that are in ./raw and send them to ./trimmomatic. This was my intention, but how does it understand to use ./raw as input and not just ./ ?

ADD REPLYlink
0
Entering edit mode

List your files in a text file. If they are all in a folder called raw, and you want to run from there, the filenames in the SE_files.txt would be raw/prefix.fastq.gz. The point is you're putting the command in a bash script, and then looping through each line (file) in the text one at a time.

Similarly you can write a bash script, as shenwei pointed out above, where you can do:

#!/usr/bin/bash
for file raw/*.fastq.gz; do
            echo $file   
            java -jar trimmomatic-0.35.jar SE -phred33 $file "`basename $file .fastq.gz`.trimmomatic_out.fastq.gz" ILLUMINACLIP:TruSeq3-SE:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
 done

You can save this as run_trim.sh, and run in the background with:

nohup bash run_trim.sh > log.txt &

Each line in the log file will have a filename to track the progress. As it's running, you can:

wc -l log.txt

to see where it's at compared to the total number of files (ls -1 raw/*.fastq.gz | wc -l )

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3.1