I'm in the process of generating two BED files, and I want to use something like bedtools intersect to join the two together. However, I would like a behaviour like the -loj flag (which reports an empty B record, if no B overlaps a given feature in A), but to report an empty A record if no A feature overlaps B also. i.e. if I have A.bed file
chr1 1 10
chr1 21 30
chr1 31 40
and B.bed file
chr1 11 20
chr1 21 30
then I would like an output similar to
chr1 1 10 . -1 -1
. -1 -1 chr1 11 20
chr1 21 30 chr1 21 30
chr1 31 40 . -1 -1
so some kind of "complete outer join", being a union of a left outer join (-loj) flag with a "right outer join" (which BEDtools doesn't do)
I suspect I could do this by first doing the left outer joins
bedtools intersect -loj -wa -wb -a A.bed -b B.bed > loj1.bed
bedtools intersect -loj -wa -wb -a B.bed -b A.bed > loj2.bed
then reordering the columns in loj2.bed using "cut" (to put the record from the A file in the first set of columns), followed by
cat loj1.bed loj2_reordered.bed | sort -k1,1 -k2,2n > sorted.bed
merging the two files then
uniq sorted.bed > complete_outer_join.bed
removing duplicate lines, but I wonder if there's a quicker method for doing all this.
P.S. The features in the A and B bed files will be defined on the same coordinates, so the overlap of an A feature on a B feature would be the same as an overlap of a B feature on an A feature.
And, of course, this would work if the BED files have more than three columns, just change the printf(".\t-1\t-1\n"); to include more fields.
Yes, but in the case where there are more than three columns, watch out for how you create the unique-union set. You may need to adjust the
awk
logic to use only the first three columns as a key, probably, depending on what you would decide to call "unique".