Using Gnu Parallel For Bedtools
2
3
Entering edit mode
10.2 years ago

I am trying to run gnu:parallel on bedtools multicov function where the original command is

bedtools multicov -bams bam1 bam2 bam3.. -bed anon.bed  > Q1_Counst.bed

I would like to implement the above command using gnu parallel. But when I run the command below

parallel -j 25 "bedtools multicov -bams {1} -bed {2} > Q1_Counst.bed" ::: minus_1_common_sorted_q1.bam minus_2_common_sorted_q1.bam minus_3_common_sorted_q1.bam plus_1_common_sorted_q1.bam plus_2_common_sorted_q1.bam plus_3_common_sorted_q1.bam ::: '/genome/genes_exon_2.bed'

each bam file is taken as separate argument , hence the processes starting are like

bedtools multicov -bams  bam1 -bed anon.bed  > Q1_Counst.bed
bedtools multicov -bams  bam2 -bed anon.bed  > Q1_Counst.bed
bedtools multicov -bams  bam3 -bed anon.bed  > Q1_Counst.bed

instead of taking all files as separate arguments. Hence Q1_Counst.bed is overwritten randomly. Could any one help me in getting exact command ? My server has around 30 cores.

parallel linux bedtools bash • 5.0k views
ADD COMMENT
3
Entering edit mode
10.2 years ago

split your bed using split

split -l100 anon.bed TMPBED

and then call multiBamCov witch each bed

ls TMPBED* | parallel   multiBamCov -bams f1.bam  f2.bam -bed '{}'  '>' out.{}.bed
ADD COMMENT
2
Entering edit mode

But it is more like

split -l100 anon.bed TMPBED

for bed in TMPBED*; do multiBamCov -bams f1.bam  f2.bam -bed $bed > $bed_out.bed & done

which create <int TMPBED*> number of sub processes in shell. Is there any other advantage here in running gnu parallel ?

ADD REPLY
1
Entering edit mode

you can limit the number of parallel jobs, you can use a remote server, and then fetch the result back , you can re-analyze only the jobs that failed, ...

ADD REPLY
0
Entering edit mode

Thanks.. It is working. :)

ADD REPLY
2
Entering edit mode
10.2 years ago
ole.tange ★ 4.4k

If you can get multiBamCov to read from stdin, you can avoid the tmp files:

cat anon.bed | parallel -l100 --pipe multiBamCov -bams f1.bam  f2.bam -bed stdin  '>' out.{#}.bed

Or if you just want all output merged into a single file:

cat anon.bed | parallel -l100 --pipe multiBamCov -bams f1.bam  f2.bam -bed stdin  >out.bed

I have never used multiBamCov, so if -bed stdin does not work, you might try:

-bed /dev/stdin
-bed '<( cat )'
-bed -
ADD COMMENT

Login before adding your answer.

Traffic: 2555 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6