Counting occurrence of character in fastq file
1
0
Entering edit mode
6.7 years ago

I want to count the number of times 1.1 appears within my fastq file. It should only appear once every 4 lines (on the first line). I have been using:

grep -o '1.1' ./seqtk_1/subsample_1/new_sub_NC_001539_1.fq.gz |wc -l

This telling me it occurs 1046902 times, which is 46902 more times than I expected.

It appears to be including these characters in its count: 101, 111, 121, 131, 141, 151, 161, 171, 181, 191, 1/1

How do I search the file for specifically 1.1? Or searching just the first time each set of 4?

I have tried using -v on grep

Thanks

grep • 2.0k views
ADD COMMENT
0
Entering edit mode

I have used this code:

awk 'NR%4==1' ./seqtk_1/subsample_1/new_sub_NC_001539_1.fq.gz -exec grep -o "1.1" {} \; | wc -l

This returns the number I wanted. But it also says:

awk: (FILENAME=./seqtk_1/subsample_1/new_sub_NC_001539_1.fq.gz FNR=4000000) fatal: cannot open file `-exec' for reading (No such file or directory)

Is it doing what I want it to do?

ADD REPLY
1
Entering edit mode

No: awk 'NR%4==1' ./seqtk_1/subsample_1/new_sub_NC_001539_1.fq.gz | grep -c '1\.1'

ADD REPLY
0
Entering edit mode

try -F instead of -o

grep -Fc 1.1  ./seqtk_1/subsample_1/new_sub_NC_001539_1.fq.gz
ADD REPLY
4
Entering edit mode
6.7 years ago
zgrep -c '1\.1' ./seqtk_1/subsample_1/new_sub_NC_001539_1.fq.gz

In a regular expression . is "any character, thus the need to escape it.

Note however that 1.1 is valid in a quality score for any recent fastq file, so you shouldn't be surprised if it appears more than once in every entry.

ADD COMMENT
0
Entering edit mode

Thank you, that worked

ADD REPLY

Login before adding your answer.

Traffic: 1496 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6