How To End Trim Sequences Using Prinseq?
3
3
Entering edit mode
12.8 years ago
Panos ★ 1.8k

I ran the following command after reading prinseq's manual

prinseq-lite.pl -fastq seqs.fastq -trim_qual_left 5 -trim_qual_right 5 -out_format 1 -out_good seqs -out_bad null

I would expect from this command to trim low quality ends and give me sequences in which all bases have a quality of at least 5. Nevertheless, when I check the output, I can find lots of low quality bases in the output file and even bases corresponding to these long runs of Bs are not removed!

Am I missing something? Is the command right for what I want to do?

trimming quality • 9.8k views
ADD COMMENT
6
Entering edit mode
12.8 years ago

I think the developer just took the equation from Heng Li's MAQ faq at http://maq.sourceforge.net/fastq.shtml

$Q = 10 * log(1 + 10 ** (ord($sq) - 64) / 10.0)) / log(10);

which you notice is missing a forward parenthesis somewhere, and which I think was originally for Solexa+64->Sanger conversions anyway. See FASTQ wiki

you see on line 1533 of prinseq-lite.pl:

return ((10 * log(1 + 10 ** (ord($qual) - 64) / 10.0)) / log(10));

he just tacked on an extra ( before the first 10, which means an Illumina phred-64 "B" returns a 10 and a "h" returns a whopping 390

I suspect the correct equation might be:

$Q = 10 *  log(1 + 10 ** ((ord($qual) - 64) /  10.0)) / log(10);

the sad thing is that incorrect equation has been copy-pasted all over the web and even in textbooks

at any rate

return (ord($qual)-64);

should suffice

ADD COMMENT
0
Entering edit mode

Thanks Jeremy! I replaced the old equation with the (simple) one you said and it works fine! Other quality-dependent parameters were also affected by this ('min_qual_mean' for example which now also gives me the expected output)! Thanks again for your time!

ADD REPLY
0
Entering edit mode

no problem seeing how this bug propogated was actually pretty interesting

ADD REPLY
0
Entering edit mode

no problem seeing how this bug propagated was actually pretty interesting

ADD REPLY
2
Entering edit mode
12.8 years ago

Is this Illumina 1.3 (phred+64) data? From your mention of B runs I suspect it is:

-si13

Quality data in FASTQ file is in Solexa/Illumina 1.3+ format and should be scaled to Phred quality scores ranging from 0 to 40. (Not required for Solexa/Illumina 1.5+, Sanger, Roche/454, Ion Torrent, PacBio data.)
ADD COMMENT
0
Entering edit mode

I have tried the '-si13' option but sequences with B runs are still there.

ADD REPLY
0
Entering edit mode

I have tried the '-si13' option but sequences with B runs are still there. One explanation would be that since B qualities have this special meaning, they might not be transformed to a number (as happens with all other quality values)...

ADD REPLY
0
Entering edit mode

i think it is a bug please see my other answer

ADD REPLY
1
Entering edit mode
12.8 years ago
Bio_X2Y ★ 4.4k

I haven't used this program, but intuitively, I suspect the [?]trim_qual_left[?] and [?]trim_qual_right[?] work by clipping bases from each side until a base with quality 5 or higher is encountered. So you would end up with reads that might still contain low quality bases, but the two bases remaining at the edges of your reads would have a quality of 5 or higher?

Are you trying to remove all reads where any base has a quality of <5, even if the base is in the middle of the read and is flanked by high quality bases? This seems like an unusual thing to want to do. Normally you would want to limit yourself to trimming the ends since that's where you would expect the low quality bases to be (e.g. in an Illumina experiment, you might see the 5' ends of your reads have high quality, and the quality degrades as you move closer to the 3' end). Read aligners tend to be tolerant of one or two low quality bases in the middle of an otherwise good sequence (configured by parameters).

If you want to remove any read with a base with a quality < 5, you might want to try experimenting with the other trimming parameters. E.g. I might start with setting the [?]trim_qual_window[?] to the length of your reads. From what I understand after a quick look at the manual, this might implement the trimming rule by considering the whole read at once instead of a sliding window of 1 base.

ADD COMMENT
0
Entering edit mode

I just want to trim the end of reads. The problem, though, is that "trim_qual_left" and "trim_qual_right" don't work as I expected; after running the above-mentioned command I still get reads with B runs in their 3' end, for example.

ADD REPLY
0
Entering edit mode

Perhaps you can only run one rule at a time? Maybe try running with "trim_qual_left", and then run it again for "trim_qual_right"?

ADD REPLY
0
Entering edit mode

Perhaps the program can implement only one rule at a time? Maybe try running with "trim_qual_left", then run the output through the program again, the second time with "trim_qual_right"?

ADD REPLY

Login before adding your answer.

Traffic: 2434 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6