get final output from Trimmomatic
0
0
Entering edit mode
6.4 years ago

Hello,

is it possible to assign the final outcome of the trimmomatic command to a variable (or a file)? I am interested in recording the percentage of passed reads, for instance I would like to get the both surviving percentage -- that is 1.11 -- from the final string given by trimmomatic:

...
Input Read Pairs: 11829489 Both Surviving: 131674 (1.11%) Forward Only 
Surviving: 685192 (5.79%) Reverse Only Surviving: 46377 (0.39%) Dropped: 
10966246 (92.70%)

Thank you

sequencing • 3.4k views
ADD COMMENT
1
Entering edit mode

It's not easy to parse plain text output such like this.

So I suggest you to use fastp since fastp provides both HTML and JSON output, and you can get the filtering results from the JSON output with any script language your like.

Furthermore, fastp is much faster and providing much more features.

ADD REPLY
0
Entering edit mode

thank you, I am not acquainted with fastp but i will have a look at it

ADD REPLY
0
Entering edit mode

Assuming the Trimmomatic output is in a file named "trimmomatic_output", one solution would be:

both=$(grep "^Input Read Pairs" trimmomatic_output | sed "s/.\+(\([0-9.]\+\)%).\+/\1/")
echo $both
1.11
ADD REPLY
0
Entering edit mode

thank you, I have added a log file with the option -trimlog (so the command is now java -jar -trimlog trimmomatic_output ...) but this now slows the computation quite quite a lot. Is there a way to calculate the trimmomatic's statistics by counting the reads in the paired files? In this way I could skip the log file, keep the analysis as fast as before and simply use shell or samtools to do the job...

ADD REPLY
0
Entering edit mode

Are you sure that this "Input Reade Pairs..." string is not part of the standard output without -trimlog option? I don't have time to check now. The -trimlog itself you definitely want to avoid. You can easily count the reads from the FASTQ files:

lines=$(wc -l fastq_file)
reads=$((lines/4))

or

awk 'END {print NR/4;}' fastq_file
ADD REPLY
0
Entering edit mode

I could not get the log file because for some reasons, the use of this option increased the execution times so much that actually crashed the computer (I don't have a powerful machine at the moment). I was wondering whether there was a more direct method for the count of reads other than the line/4 approach.

ADD REPLY
0
Entering edit mode

What I meant is: your output that you want to parse should be independent of the -trimlog option. That option will tell you for each read how it was processed. I just tested this on a Trimmomatic (v 0.36) run. You will get your output without -trimlog. You just need to redirect the output of Trimmomatic into a file.

java -jar Trimmomatic-0.36/trimmomatic-0.36.jar PE -threads 2  r1.fq.gz r2.fq.gz p1.fq.gz u1.fq.gz p2.fq.gz u2.fq.gz ILLUMINACLIP:Trimmomatic-0.36/adapters/TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 &> trimmomatic_output

Then you can run my first grep example.

ADD REPLY
0
Entering edit mode

now I see, thank you!

ADD REPLY

Login before adding your answer.

Traffic: 1954 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6