Pick highest quality fastq reads

0

Entering edit mode

8.1 years ago

Biomonika (Noolean) 3.2k

I am familiar with many trimmers and quality control software, but I don't think I came across software that would allow me to pick only n millions of the highest quality reads. I am not sure how to define "high quality here", probably mainly as having highest phred score along the whole length of the read. One solution would be to discard all the low quality data and then randomly pick as many reads as I would need from the rest of the data.

However, it would be more convenient to have a tool that sorts reads based on quality and lets me pick top n. I would use it for genomic DNA dataset where I currently have much higher coverage than I actually need and where many reads are very poor based on FASTQC report (failing per base sequence quality, per tile sequence quality and k-mer content).

fastq quality filter • 2.0k views

ADD COMMENT • link 8.1 years ago by Biomonika (Noolean) 3.2k

0

Entering edit mode

If there are real poor quality reads in the set then it may be best to use the solution you noted above (filter and then randomly sample). reformat.sh from BBMap can sample reads easily.

ADD REPLY • link 8.1 years ago by GenoMax 141k

0

Entering edit mode

So far I have used fastq_quality_filter from fastx_toolkit with parameters -q 20 -p 100 in case anyone was interested.

ADD REPLY • link 8.0 years ago by Biomonika (Noolean) 3.2k

Login before adding your answer.