Pick highest quality fastq reads
0
0
Entering edit mode
8.1 years ago

I am familiar with many trimmers and quality control software, but I don't think I came across software that would allow me to pick only n millions of the highest quality reads. I am not sure how to define "high quality here", probably mainly as having highest phred score along the whole length of the read. One solution would be to discard all the low quality data and then randomly pick as many reads as I would need from the rest of the data.

However, it would be more convenient to have a tool that sorts reads based on quality and lets me pick top n. I would use it for genomic DNA dataset where I currently have much higher coverage than I actually need and where many reads are very poor based on FASTQC report (failing per base sequence quality, per tile sequence quality and k-mer content).

fastq quality filter • 2.0k views
ADD COMMENT
0
Entering edit mode

If there are real poor quality reads in the set then it may be best to use the solution you noted above (filter and then randomly sample). reformat.sh from BBMap can sample reads easily.

ADD REPLY
0
Entering edit mode

So far I have used fastq_quality_filter from fastx_toolkit with parameters -q 20 -p 100 in case anyone was interested.

ADD REPLY

Login before adding your answer.

Traffic: 2957 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6