Question: HOMER parallel annotation for big .bed file
Dear all, I'm trying to annotate a huge file with HOMER, since I need information about few millions of sites. I would like to parallelize this process in batches of say 10000 instances of my .bed file. Is there a straight forward way to do so? I tried to get this done with GNU parallel but I really can't figure out if and how I can pass arguments through a pipe to HOMER command. mybig.bed hg19 > output.txt

The idea would be to split the .bed file into N pieces, run a multiple number of jobs (both in parallel and in sequence) and then obtain a unique output with all the annotations from them. It might be trivial but I'm really confused on argument piping in this context. The other option would be to write a bash script to create those pieces as files and only then iterate through them using their names, but I was looking for something more elegant. Thank you in advance

ADD COMMENTlink 20 months ago franc.jian • 30 • updated 19 months ago ole.tange ♦ 3.4k
I solved this using split and then parallel, and then I merged the annotated files again downstream. Note: each file contains the header, which should be removed before merging! I'm sure there are more elegant solutions, but this works!

 split -l 50000 ./../Big_bed.bed
    ls * | parallel -j 10 ' {} hg19 > ./../anno_chunks/{.}_annotated.txt
ADD COMMENTlink 19 months ago franc.jian • 30
Can you test if this works, too:

parallel -a ../Big_bed.bed --pipe-part --block -1 --fifo \ {} hg19 > ./../anno_chunks/{.}_annotated.txt

or this (slower):

parallel -a ../Big_bed.bed --pipe-part --block -1 --cat \ {} hg19 > ./../anno_chunks/{.}_annotated.txt
ADD COMMENTlink 19 months ago ole.tange ♦ 3.4k
Thank you for your answer and sorry for being late. I tried both solution but I got the same error:

Died at /usr/bin/parallel line 241

I was not able to troubleshoot this...

ADD REPLYlink 19 months ago
• 30

