combining multiple gz files in subfolders
2
0
Entering edit mode
8.7 years ago
rob234king ▴ 610

I have a folder of a plate run from an illumina machine, which in it has many folders of different samples each with four files from illumina. I want to combine the two left R1 files to one gz file and combine the two right R2 files to one file for each sample (folder). I can't see an automated version rather than just gzip each subfolder in turn but likely there is a bash way which completely missing?

i.e. within folder "plateK"

Sample1_folder

sampAR1_a.gz, SampAR1_b.gz, SampAR2_a.gz, SampAR2_b.gz

Sample2_folder

sampBR1_a.gz, sampBR1_b.gz, sampBR2_a.gz, sampBR2_b.gz
bash • 5.0k views
ADD COMMENT
0
Entering edit mode

I think that's not really a bioinformatics question, however you can use tar with --include option,

tar -c -f R1.tar --include='*R1_*.gz' or similar should work, see man tar.

ADD REPLY
5
Entering edit mode
8.7 years ago

If you want to concatenate all files in each depth=1 subdirectory you could do something like:

for i in `ls -d */`; do
   cat ${i}*R1*.gz > ${i}R1.gz
   cat ${i}*R2*.gz > ${i}R2.gz
done
ADD COMMENT
4
Entering edit mode
8.7 years ago
mark.ziemann ★ 1.9k

You can cat gzip files to concatenate them.

cat sampAR1_a.gz sampAR1_b.gz > sampAR_1.fq.gz
cat sampAR2_a.gz sampAR2_b.gz > sampAR_2.fq.gz

Be very careful to maintain correct order in the final files otherwise you will have a lot of discordant read pairs.

ADD COMMENT
0
Entering edit mode

Question sir, If i had 300 pairs to file to combine, please help with the appropriate script.

ADD REPLY

Login before adding your answer.

Traffic: 3269 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6