Biostar Beta. Not for public use.
Using trimmomatic on multiple single-end read files
0
Entering edit mode
3.7 years ago

I need help to write a for loop to run Trimmomatic tool for quality trimming of single-end fastq files. I need to write a for loop so that I can run an executable for all multiple files. I read the exchanges of a similar question for the paired-end data. But it does not help me much. Any help please! Thanks!

1
Entering edit mode
18 months ago
China

It's much simpler than PE ends files:

Shell:

for file in *.fq.gz; do
# do something with the file
echo $file done  ls *.fq.gz | parallel 'echo {}'  ADD COMMENTlink 0 Entering edit mode 20 months ago st.ph.n ♦ 2.5k Philadelphia, PA Make a bash script with your trimmomatic command: #!/usr/bin/bash java -jar trimmomatic-0.35.jar SE -phred33$1 "basename $1 .fastq.gz.trimmomatic_out.fastq.gz" ILLUMINACLIP:TruSeq3-SE:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36  where $1 is your input file, and basename will remove the .fastq.gz and replace with the suffix .trimmomatic_out.fastq.gz. Save as run_all_trim.sh.

List all of your single end files in a a file as a list (single column): SE_files.txt. If they are all in one dir: ls -1 *.fastq.gz > SE_files.txt. Then pass each of your single end files to the trimmomatic command.

cat SE_files.txt | xargs -n 1 bash run_all_trim.sh


If you have a lot of files, and don't want it to hangup, and to run in the background:

cat SE_files.txt | xargs -n 1 nohup bash run_all_trim.sh &


htop or top to check periodically that it's still running.

0
Entering edit mode

Hi, your comment is getting old but was very useful. However, could you explain how it works? I don't get it.

I wrote this:

trimmomatic SE -threads 16 -phred33 $1 “/trimmomatic/basename$1 .fastq.gz.trimmomatic_out.fastq.gz" \


It uses as input my files that are in ./raw and send them to ./trimmomatic. This was my intention, but how does it understand to use ./raw as input and not just ./ ?

0
Entering edit mode

List your files in a text file. If they are all in a folder called raw, and you want to run from there, the filenames in the SE_files.txt would be raw/prefix.fastq.gz. The point is you're putting the command in a bash script, and then looping through each line (file) in the text one at a time.

Similarly you can write a bash script, as shenwei pointed out above, where you can do:

#!/usr/bin/bash
for file raw/*.fastq.gz; do
echo $file java -jar trimmomatic-0.35.jar SE -phred33$file "basename \$file .fastq.gz.trimmomatic_out.fastq.gz" ILLUMINACLIP:TruSeq3-SE:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
done


You can save this as run_trim.sh, and run in the background with:

nohup bash run_trim.sh > log.txt &


Each line in the log file will have a filename to track the progress. As it's running, you can:

wc -l log.txt


to see where it's at compared to the total number of files (ls -1 raw/*.fastq.gz | wc -l )