bgzip all VCFs in a directory
1
4
Entering edit mode
8.7 years ago
stevenlang123 ▴ 210

Hi guys,

So I've been trying to bgzip around 100 VCF files in parallel, but although the jobs are submitted and files get created there's definitely something wrong.

So far I've been trying:

$ for file in *.vcf
> do
> bsub /foo/bar/bgzip $file
>> $file.gz

What is the correct way to do this?

Thanks in advance!

sequencing seq • 11k views
ADD COMMENT
9
Entering edit mode
8.7 years ago

Use GNU parallel

parallel bgzip {} ::: *.vcf

or xargs:

ls *.vcf | xargs -P 10 bgzip
ADD COMMENT
3
Entering edit mode

When I use ls *.vcf | xargs -P 10 bgzip it only compresses the first file in the folder. Using -n1 (=use at most 1 argument per command line) instead of -P 10 worked for me:

ls *.vcf | xargs -n1 bgzip
ADD REPLY
1
Entering edit mode

You can use both (xargs -n1 -P0) to get a computer to use max_procs and dispatch one file (line) per process.

ADD REPLY
0
Entering edit mode

Is there any way to do this using an LSF manager to split the jobs up instead ? The problem I'm having is that bgzip requires to redirect the file output so

bsub < bgzip $file > $file.gz

does not work

ADD REPLY
0
Entering edit mode

Never mind, working great Pierre! Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 3051 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6