Biostar Beta. Not for public use.
concatenate multiple GZip fastq files from multilane run and output combined gzip file
0
Entering edit mode
3.4 years ago
PAn • 20
United States

I need to write a perl script to read gzipped fastq files from a text file list of their paths and then concatenate them together and output to a new gzipped file. ( I need to do this in perl as it will be implemented in a pipeline) I am not sure how to accomplish the zcat and concatenation part, as the file sizes would be in Gbs, I need to take care of the storage and run time as well.

So far I can think of it as -

use strict;
use warnings;
use IO::Compress::Gzip qw(gzip $GzipError) ;

#-------check the input file specified-------------#

$num_args = $#ARGV + 1;
if ($num_args != 1) {
    print "\nUsage: name.pl Filelist.txt \n";
exit;

$file_list = $ARGV[0];

#-------------Read the file into arrray-------------#

my @fastqc_files;   #Array that contains gzipped files 
use File::Slurp;
my @fastqc_files = $file_list;


#-------use the zcat over the array contents 
my $outputfile = "combined.txt"
open(my $combined_file, '>', $outputfile) or die "Could not open file '$outputfile' $!";

for my $fastqc_file (@fastqc_files) {

    open(IN, sprintf("zcat %s |", $fastqc_file)) 
      or die("Can't open pipe from command 'zcat $fastqc_file' : $!\n");
    while (<IN>) {
        while ( my $line = IN ) {
          print $outputfile $line ;
        }
    }
    close(IN);

my $Final_combied_zip = new IO::Compress::Gzip($combined_file);
  or die "gzip failed: $GzipError\n";

Somehow I am not able to get it to run. Can anyone share if there is simpler/ correct method to accomplish this? Thanks!

ADD COMMENTlink
1
Entering edit mode

using zcat and compressing is useless : https://www.biostars.org/p/81924/#81925

ADD REPLYlink
0
Entering edit mode

What would be better way to combine gzip files then? I need to basically stitch them together, not just combine gzip files into one big gzip file (and I need to take the GBs size of file into account too)

ADD REPLYlink
0
Entering edit mode

The point is that "stitching them together" just means concatenating them. There is no difference. You can do this in one line (without perl) as Pierre's comment suggests.

ADD REPLYlink
0
Entering edit mode

Thanks Pierre and Sean, I understand its better to run it as one line command rather than perl but I really need to run it in perl, as I need to implement it in a pipeline which has other components, config files and XML caller etc. I will give it another shot, else will tell the collaborators to settle with one liner ( I prefer it as well)!

ADD REPLYlink
0
Entering edit mode

You can run a shell command from perl. A little googling will tell you how.

ADD REPLYlink
1
Entering edit mode

Thanks Sean, yes I got it running by simply using system zcat command in script. Thanks!

#!/usr/bin/perl
use strict;
use warnings;
use File::Slurp;

my @data = read_file('./File_list.txt');
my $out = "./test.txt";

foreach my $data_file (@data)
{
    chomp($data_file)
    system("zcat $data_file >> $out");
}
ADD REPLYlink
0
Entering edit mode

Glad it worked out for you. Remember to "remove" your output file before entering the loop so that if the script has failed, you don't simply append to the "bad" file.

ADD REPLYlink
0
Entering edit mode

Oh yes, thats right. Thanks for pointing it out. I have another question - can I use ARGV for input-file instead of specifying it in script. I tried modifying the script to

#!/usr/bin/perl
use strict;
use warnings;
use File::Slurp;

my @data = read_file(ARGV[0]);

instead of specifying the path for input file

but it shows error, can you please point out, sorry it must be very trivial.

Thanks!

ADD REPLYlink
0
Entering edit mode

You'll definitely need to do a little reading on arguments in perl. For example:

http://alvinalexander.com/perl/perl-command-line-arguments-read-args

ADD REPLYlink
0
Entering edit mode

Yes working on the perl basics. Thanks, it works now!

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1