Biostar Beta. Not for public use.
Question: Using Gnu Parallel For Bedtools
3
Entering edit mode

I am trying to run gnu:parallel on bedtools multicov function where the original command is

bedtools multicov -bams bam1 bam2 bam3.. -bed anon.bed  > Q1_Counst.bed

I would like to implement the above command using gnu parallel. But when I run the command below

parallel -j 25 "bedtools multicov -bams {1} -bed {2} > Q1_Counst.bed" ::: minus_1_common_sorted_q1.bam minus_2_common_sorted_q1.bam minus_3_common_sorted_q1.bam plus_1_common_sorted_q1.bam plus_2_common_sorted_q1.bam plus_3_common_sorted_q1.bam ::: '/genome/genes_exon_2.bed'

each bam file is taken as separate argument , hence the processes starting are like

bedtools multicov -bams  bam1 -bed anon.bed  > Q1_Counst.bed
bedtools multicov -bams  bam2 -bed anon.bed  > Q1_Counst.bed
bedtools multicov -bams  bam3 -bed anon.bed  > Q1_Counst.bed

instead of taking all files as separate arguments. Hence Q1_Counst.bed is overwritten randomly. Could any one help me in getting exact command ? My server has around 30 cores.

ADD COMMENTlink 6.0 years ago geek_y 9.7k • updated 6.0 years ago ole.tange ♦ 3.4k
3
Entering edit mode

split your bed using split

split -l100 anon.bed TMPBED

and then call multiBamCov witch each bed

ls TMPBED* | parallel   multiBamCov -bams f1.bam  f2.bam -bed '{}'  '>' out.{}.bed
ADD COMMENTlink 6.0 years ago Pierre Lindenbaum 120k
Entering edit mode
2

But it is more like

split -l100 anon.bed TMPBED

for bed in TMPBED*; do multiBamCov -bams f1.bam  f2.bam -bed $bed > $bed_out.bed & done

which create <int TMPBED*> number of sub processes in shell. Is there any other advantage here in running gnu parallel ?

ADD REPLYlink 6.0 years ago
geek_y
9.7k
Entering edit mode
1

you can limit the number of parallel jobs, you can use a remote server, and then fetch the result back , you can re-analyze only the jobs that failed, ...

ADD REPLYlink 6.0 years ago
Pierre Lindenbaum
120k
Entering edit mode
0

Thanks.. It is working. :)

ADD REPLYlink 6.0 years ago
geek_y
9.7k
2
Entering edit mode

If you can get multiBamCov to read from stdin, you can avoid the tmp files:

cat anon.bed | parallel -l100 --pipe multiBamCov -bams f1.bam  f2.bam -bed stdin  '>' out.{#}.bed

Or if you just want all output merged into a single file:

cat anon.bed | parallel -l100 --pipe multiBamCov -bams f1.bam  f2.bam -bed stdin  >out.bed

I have never used multiBamCov, so if -bed stdin does not work, you might try:

-bed /dev/stdin
-bed '<( cat )'
-bed -
ADD COMMENTlink 6.0 years ago ole.tange ♦ 3.4k

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0