Hi, How can we do Read count from a batch of fastq.gz files.
Obviously if we unzip it then it is easy by grep.
But what is the best way to do without unzipping.
I tried this for a batch of files:
for i in `ls RIF*.fastq.gz | sort`; do zcat $i | echo $((`wc -l`/4)) ; done
Gives me result:
133408
133408
127328
127328
124862
124862
156815
156815
146333
146333
123459
123459
159771
159771
126039
126039
167112
167112
97320
97320
But When I unzip those same files and run grep with the identifier like
for i in `ls *.fastq | sort`; do grep "@M03691" $i |wc -l; done
I get true result as
132637
132637
126435
126435
123616
123616
156347
156347
145592
145592
123059
123059
158654
158654
125626
125626
166377
166377
97159
97159
Which is clearly a mismatch. Can anybody please suggest me what i am missing out in my first try. Thanks, Mitra
If you have
zcat
then you must havezgrep
. Usezgrep -c "^@Sequencer_ID" file.fq.gz
to get your counts.Great works thanks..don't know why zgrep didn't come to my mind.
But still I really wonder why wc -l`/4 didn't work. Definitely all these fastq files have 4 lines.