How to remove empty read from fastq
1
0
Entering edit mode
8.7 years ago
cfarmeri ▴ 210

Hello,

I would like to remove empty reads from fastq_file after trimming adapter sequencing. This fastq is from 454 GS-FLX.

I tried to remove that using following fastx_clipper(in fastx_toolkit)

fastx_clipper -Q33 -l 1 -i in.fastq -o out.fastq

But I received following error message:

Segmentation fault (core dumped)

Anybody has solution about this problem? Other software can remove these empty reads?

Thanks.

software-error • 7.3k views
ADD COMMENT
0
Entering edit mode

In the latest documentation, I can't find a Q flag for the fastx_clipper command. Maybe try removing that flag?

ADD REPLY
3
Entering edit mode
8.7 years ago
arnstrm ★ 1.8k

If you just want to get rid of short sequences, you can use biowak

bioawk -cfastx 'length($seq) > 1 {print "@"$name"\n"$seq"\n+\n"$qual}'

Edit: FIXED based on comment below!

ADD COMMENT
0
Entering edit mode

Thanks, it works well. The empty reads are removed.

But the head @ characters of the read name line (line1) at each read were also removed.

So I couldn't FastQC these processed fastq...

ADD REPLY
2
Entering edit mode

Oh yeah, I forgot. The command should include printing @ before the name:

bioawk -cfastx 'length($seq) > 1 {print "@"$name"\n"$seq"\n+\n"$qual}'
ADD REPLY
0
Entering edit mode

Thank you so much!!

I can get processed fastq file trimmed correctly.

ADD REPLY
0
Entering edit mode

I think we can add $comment if there are comments in your fastq files for specifying read number etc:

bioawk -cfastx 'length($seq) > 1 {print "@"$name" "$comment"\n"$seq"\n+\n"$qual}'
ADD REPLY
0
Entering edit mode

Hello,

I too want to remove the empty reads after the adapter trimming. Could you please elaborate your code

bioawk -cfastx 'length($seq) > 1 {print "@"$name"\n"$seq"\n+\n"$qual}'

I mean what part does what and where is the input file ?

ADD REPLY
1
Entering edit mode

Hey,

bioawk works like a typical awk command but has been modified to understand some of the common ngs file formats (fasta, fastq, gff etc), hence we the -c flag (to consider the format as fastq). Like awk, it needs awk 'condition{action}' filename.

The condition here is length($seq) > 1 which means length of the sequence is greater than 1

The action here is {print "@"$name"\n"$seq"\n+\n"$qual}' which is to print the sequence back in fastq format (if the condition is satisfied).

You supply the filename after you close the single quote as shown above.

PS: you should not ask a question in a existing thread. This should have been simply a followup comment in the above answer.

ADD REPLY
0
Entering edit mode

Thank you so much Arnstrm.

ADD REPLY

Login before adding your answer.

Traffic: 2971 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6