Question

Removal of non quality read in fastq file

0

Entering edit mode

6.0 years ago

akbioinfo14 • 0

I am trying to remove the non quality reads in fastq,

For Example

@NB501309:173:HYW77BGX5:1:11101:23920:1057 1:N:0:CATTTTAT+GGGGGGGG

TCTCANGGAGAGTTCGATCCTGGCTCAGGATGAACGCTGGCGGCATGCTTAACACATGCAAGTCGAACGGGAAGT

+

AAAAA#EEEEEEEEAEEEAEEEEEEEEEEEEEEEEEEAEEEA<EEE/AE<EEEEEEEE6EEEEEEEEEEAEEEEE

**@NB501309:173:HYW77BGX5:1:11101:19977:1057 1:N:0:CATTTTAT+GGGGGGGG

CCCGTNGTGGAATTCTCGGGTGCCAAGGAACTCCAGTCACCATTTTATCTCGTATGCCGTCTTCTGCTTGAAAAA

+**

@NB501309:173:HYW77BGX5:1:11101:16270:1057 1:N:0:CATTTTAT+GGGGGGGG

ATTCTNGGGTGCCAAGGAACTCCAGTCACCATTTTATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAGGGGGG

+

AAAAA#EEEEEEEEEEEEEEEEEEEEEEE/EEEEEE///AE//EE/EA/A/A//E//A<//EEEAEE///A/EEE

**@NB501309:173:HYW77BGX5:1:11101:15947:1058 1:N:0:CATTTTAT+GGGGGGGG

CCCGTNGTGGAATTCTCGGGTGCCAAGGAACTCCAGTCACCATTTTATCTCGTATGCCGTCTTCTGCTTGAAAAA

+**

I want to remove @NB501309:173:HYW77BGX5:1:11101:15947:1058 1:N:0:CATTTTAT+GGGGGGGG and @NB501309:173:HYW77BGX5:1:11101:15947:1058 1:N:0:CATTTTAT+GGGGGGGG from the fastq file.

Kindly suggest me how to perform.

Thanks in advance

fastq quality • 1.9k views

ADD COMMENT • link updated 6.0 years ago by Bastien Hervé 5.3k • written 6.0 years ago by akbioinfo14 • 0

0

Entering edit mode

http://hannonlab.cshl.edu/fastx_toolkit/commandline.html#fastq_quality_filter_usage

Almost all fastq trimming tools support filtering by score.

ADD REPLY • link 6.0 years ago by cpad0112 21k

0

Entering edit mode

thank you, but i want to remove the corresponding read from the fastq file only which is not having quality scores.

ADD REPLY • link 6.0 years ago by akbioinfo14 • 0

0

Entering edit mode

What did you do to have fastq file with some reads without quality ?

~~If your file is not that big you can use Biopython :~~

Open your file in python script

Put the content of your file in a SeqIO object

Loop over SeqIO object (records)

~~For each record in records~~
- ~~If you have quality~~
  - ~~Write record in new file~~

ADD REPLY • link 6.0 years ago by Bastien Hervé 5.3k

0

Entering edit mode

While pooling, the reads quality has missed out, now i have to remove those reads. Kinldy help

ADD REPLY • link 6.0 years ago by akbioinfo14 • 0

1

Entering edit mode

I think that you should retry your pooling. There is no reason you loose reads because your pooling missed. If quality was missing in the raw fastq file, you can try to remove these reads from raw fastq file

ADD REPLY • link 6.0 years ago by Bastien Hervé 5.3k

0

Entering edit mode

I repooled it, again the same problem, i think i have missed in the raw file only, Kindly help me how to remove those reads.

ADD REPLY • link 6.0 years ago by akbioinfo14 • 0

0

Entering edit mode

You should try this tool to validate your fastq first. Investigate your fastq raw files is the best way to not loose information.

In any way if you want to remove these reads, my script below should do the trick

ADD REPLY • link 6.0 years ago by Bastien Hervé 5.3k

score 0 · Answer 1 · 2018-06-01

record=[]

new_file = open('new_no_qual.fastq', 'a')

with open("no_qual.fastq") as f:
    for line in f:
        if line.startswith("@"):
            if len(record) == 4:
                new_file.write('\n'.join(record)+"\n")
            record=[]
            record.append(line.rstrip())
        else:
            record.append(line.rstrip())
    if len(record) == 4:
        new_file.write('\n'.join(record))

new_file.close()

Tell me if it's too slow, I'll think about it