How to decompress several fastq.gz files into one single file?
4
0
Entering edit mode
7.5 years ago

How to decompress several fastq.gz files and concatenate them into one single .fastq file? There are two kinds of file in the same folder, some ending in ...R2_001.fastq.gz , and others ending in R1_001.fastq.gz. The work is to decompress and concatenate files ending in ...R2_001.fastq.gz into one R2_001.fastq file, and do the same for ...R2_001.fastq.gz files. Really appreciate any help. Thanks -madza

next-gen • 9.3k views
ADD COMMENT
3
Entering edit mode

Hello madzayasodara!

It appears that your post has been cross-posted to another site: http://seqanswers.com/forums/showthread.php?t=71842

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY
0
Entering edit mode

Didn't know this was a prob. Thanks for letting me know. -madza

ADD REPLY
0
Entering edit mode

What if we wanted that name of the file is also the name of the output file?

ADD REPLY
0
Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. SUBMIT ANSWERS is used only for submitting new answers for original question.

ADD REPLY
5
Entering edit mode
7.5 years ago

Hi Madza

Another approach could be

For Compressed output files:

cat *R1*.fastq.gz >> R1.fastq.gz
cat *R2*.fastq.gz >> R2.fastq.gz

For Uncompressed output files:

zcat *R1*.fastq.gz >> R1.fastq
zcat *R2*.fastq.gz >> R2.fastq

Thanks, Persistent LABS

ADD COMMENT
1
Entering edit mode
7.5 years ago

Hi Madza,

Can you give it a try?

For file ending in ...R1_001.fastq.gz

find -name '*R1_001.fastq.gz' | xargs gunzip -c  >> all_R1_001.fastq

similarly or file ending in ...R2_001.fastq.gz

find -name '*R2_001.fastq.gz' | xargs gunzip -c  >> all_R2_001.fastq

Let me know if it worked for you.

ADD COMMENT
1
Entering edit mode
7.5 years ago
chen ★ 2.5k

Actually cat is enough.

cat 1.fq.gz 2.fq.gz 3.fq.gz ... > all.fq.gz

or

cat *.fq.gz > all.fq.gz
ADD COMMENT
1
Entering edit mode

OP asked for uncompressed fastq at the end, so you'll have to add in a gunzip -c ;)

ADD REPLY
0
Entering edit mode

It will be a pain to type (of course with tab completion) all the file names specially when there are say 100 such files and that too for R1 and R2 separately.

ADD REPLY
0
Entering edit mode

That's why I suggested to use a wildcard

ADD REPLY
1
Entering edit mode
7.5 years ago
ole.tange ★ 4.4k

Using GNU Parallel:

parallel -j1 'zcat *{}_001.fastq.gz > {}_001.fastq' ::: R1 R2

If you disk system is fast, you can increase -j1.

ADD COMMENT
0
Entering edit mode

Gnu parallel is amazing. I find this more intuitive and easier syntax to remember:

find . -name '*.fastq' | parallel -j 1 'zcat {} > {.}.combined.fastq'
ADD REPLY
0
Entering edit mode

But your example will not decompress and append foo_R1_001.fastq.gz and bar_R1_001.fastq.gz into R1_001.fastq

ADD REPLY
0
Entering edit mode

Yes, you are right about that. My code should be altered to capture either R1 or R2 and write output accordingly, and this should then be ran twice slightly modified. Therefore your example is definitely better.

ADD REPLY
0
Entering edit mode

What if we wanted that name of the file is also the name of the output file?

ADD REPLY

Login before adding your answer.

Traffic: 1679 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6