Biostar Beta. Not for public use.
Question: How to decompress several fastq.gz files into one single file?
0
Entering edit mode

How to decompress several fastq.gz files and concatenate them into one single .fastq file? There are two kinds of file in the same folder, some ending in ...R2_001.fastq.gz , and others ending in R1_001.fastq.gz. The work is to decompress and concatenate files ending in ...R2_001.fastq.gz into one R2_001.fastq file, and do the same for ...R2_001.fastq.gz files. Really appreciate any help. Thanks -madza

ADD COMMENTlink 3.4 years ago madzayasodara • 10 • updated 2.9 years ago ghanbari.msc • 0
Entering edit mode
3

Hello madzayasodara!

It appears that your post has been cross-posted to another site: http://seqanswers.com/forums/showthread.php?t=71842

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLYlink 3.4 years ago
Pierre Lindenbaum
120k
Entering edit mode
0

Didn't know this was a prob. Thanks for letting me know. -madza

ADD REPLYlink 3.4 years ago
madzayasodara
• 10
Entering edit mode
0

What if we wanted that name of the file is also the name of the output file?

ADD REPLYlink 2.9 years ago
ghanbari.msc
• 0
Entering edit mode
0

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. SUBMIT ANSWERS is used only for submitting new answers for original question.

ADD REPLYlink 2.9 years ago
genomax
68k
5
Entering edit mode

Hi Madza

Another approach could be

For Compressed output files:

cat *R1*.fastq.gz >> R1.fastq.gz
cat *R2*.fastq.gz >> R2.fastq.gz

For Uncompressed output files:

zcat *R1*.fastq.gz >> R1.fastq
zcat *R2*.fastq.gz >> R2.fastq

Thanks, Persistent LABS

ADD COMMENTlink 3.4 years ago Persistent LABS • 740
1
Entering edit mode

Hi Madza,

Can you give it a try?

For file ending in ...R1_001.fastq.gz

find -name '*R1_001.fastq.gz' | xargs gunzip -c  >> all_R1_001.fastq

similarly or file ending in ...R2_001.fastq.gz

find -name '*R2_001.fastq.gz' | xargs gunzip -c  >> all_R2_001.fastq

Let me know if it worked for you.

ADD COMMENTlink 3.4 years ago Vijay Lakhujani 4.1k
1
Entering edit mode

Actually cat is enough.

cat 1.fq.gz 2.fq.gz 3.fq.gz ... > all.fq.gz

or

cat *.fq.gz > all.fq.gz
ADD COMMENTlink 3.4 years ago chen ♦ 1.9k
Entering edit mode
1

OP asked for uncompressed fastq at the end, so you'll have to add in a gunzip -c ;)

ADD REPLYlink 3.4 years ago
WouterDeCoster
39k
Entering edit mode
0

It will be a pain to type (of course with tab completion) all the file names specially when there are say 100 such files and that too for R1 and R2 separately.

ADD REPLYlink 3.4 years ago
Vijay Lakhujani
4.1k
Entering edit mode
0

That's why I suggested to use a wildcard

ADD REPLYlink 3.4 years ago
chen
♦ 1.9k
1
Entering edit mode

Using GNU Parallel:

parallel -j1 'zcat *{}_001.fastq.gz > {}_001.fastq' ::: R1 R2

If you disk system is fast, you can increase -j1.

ADD COMMENTlink 3.4 years ago ole.tange ♦ 3.4k
Entering edit mode
0

Gnu parallel is amazing. I find this more intuitive and easier syntax to remember:

find . -name '*.fastq' | parallel -j 1 'zcat {} > {.}.combined.fastq'
ADD REPLYlink 3.4 years ago
WouterDeCoster
39k
Entering edit mode
0

But your example will not decompress and append foo_R1_001.fastq.gz and bar_R1_001.fastq.gz into R1_001.fastq

ADD REPLYlink 3.4 years ago
ole.tange
♦ 3.4k
Entering edit mode
0

Yes, you are right about that. My code should be altered to capture either R1 or R2 and write output accordingly, and this should then be ran twice slightly modified. Therefore your example is definitely better.

ADD REPLYlink 3.4 years ago
WouterDeCoster
39k
Entering edit mode
0

What if we wanted that name of the file is also the name of the output file?

ADD REPLYlink 2.9 years ago
ghanbari.msc
• 0

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0