Some major changes and speed enhancements were added in version 2.4.27 of BEDOPS. These changes are detailed in the revision history for this version:
http://bedops.readthedocs.io/en/latest/content/revision-history.html#v2-4-27
This version (and versions after 2.4.27) includes the packaging of two versions of each of the BEDOPS binaries, one suffixed -typical
and another suffixed -megarow
.
So there is bedops-typical
and bedops-megarow
, and the same for bedmap
, etc.
We set up symbolic links so that you can keep your pipelines written as they are. The typical
binaries are the default selection.
The typical
binaries are compiled with a shorter maximum token length, so as to reduce memory usage and maximize speed improvements. Most people can use typical
binaries without having to think about this or worry about it.
However, you are running into an issue where the BED line length is very long — too long for typical
binary use.
So we have included megarow
binaries, which allow longer token lengths based off the values in these lines of the parent BEDOPS Makefile:
https://github.com/bedops/bedops/blob/master/Makefile#L9-L11
You could try using the megarow
binaries to see if this helps with your set operations.
If you have installed the two-build version of BEDOPS (I'm not sure what Homebrew does, now), you can use the convenience script switch-BEDOPS-binary-type
to switch between typical
and megarow
builds of BEDOPS.
For example, to switch the binary set to -megarow
suffixed binaries:
$ switch-BEDOPS-binary-type --megarow
This changes the binaries that the symbolic links point to.
You'll see that the binary version and help statements change, e.g.:
$ bedops --version
bedops
citation: http://bioinformatics.oxfordjournals.org/content/28/14/1919.abstract
version: 2.4.29 (megarow)
authors: Shane Neph & Scott Kuehn
See the version
key.
You can switch back to typical
with the same script:
$ switch-BEDOPS-binary-type --typical
If the megarow
binaries do not work, you can edit the parameters in the lines of the parent Makefile (https://github.com/bedops/bedops/blob/master/Makefile#L9-L11) and recompile with make all
(not make
alone).
This target builds the two sets of binaries (typical
and megarow
) while including any changes you make to specify longer token lengths (such as to ID and non-ID parts of a BED4+ file, by editing MASSIVE_ID_EXP
and MASSIVE_REST_EXP
, resp.) in the -megarow
suffixed binaries.
You could also edit the header BEDOPS.Constants.hpp
directly, to choose the desired exponent for the maximum token length, and then run make
(not make all
). This approach would make one build of binaries with custom line length parameters.
I recall seeing the twin binaries and the symlinks, but did not put two and two together. This is a super convenient method! Thank you!