Biostar Beta. Not for public use.
bedtools merge - removing features completly overlapped by other features
0
Entering edit mode
3.4 years ago

Hello,

F1 -------------------------
F2     ----
F3                ----
F4                            --------------


Default bedtools merge behaviour will output one sole feature spanning from F1 start to F4 end, since F1 and F4 overlap.

Is it possible to remove features that are completely covered by any other feature? In other words, output should be F1 and F4.

Thanks.

EDIT: I suppose another way to ask this is "How can I output features from one file that do not overlap 100% any other features in the same file?"

bedtools • 1.0k views
1
Entering edit mode
3.4 years ago
@Alex Reynolds20

Here's a one-liner with another toolkit:

$bedops -m A.bed B.bed | bedmap --echo-map --exact - <(bedops -u A.bed B.bed) | sort-bed - > answer.bed  Or if you only have one file: $ bedops -m C.bed | bedmap --echo-map --exact - C.bed > answer.bed


It is assumed A.bed and B.bed and C.bed are sorted before set operations, i.e.:

$sort-bed A.unsorted.bed > A.bed$ sort-bed B.unsorted.bed > B.bed
$sort-bed C.unsorted.bed > C.bed  Demo: $ more C.bed
chr1    100     120     id-1
chr1    110     112     id-2
chr1    114     117     id-3
chr1    125     130     id-4

$bedops -m C.bed | bedmap --echo-map --exact - C.bed chr1 100 120 id-1 chr1 125 130 id-4  Another option is to filter the input file for purely nested elements. Nested elements are a pain, but there are ways to deal with them. Start with a utility awk script: #!/usr/bin/env awk -f { if (NR > 1) { currentChr =$1
currentStart = $2 currentStop =$3
if ((previousStart < currentStart) && (previousStop > currentStop)) {
print $0; } else { previousChr = currentChr previousStart = currentStart previousStop = currentStop } } else { previousChr =$1
previousStart = $2 previousStop =$3
}
}


Let's call that script getNestedElements.awk. Make sure it is executable (chmod u+x ./getNestedElements.awk).

You can use this script to identify elements in A2.bed that are nested, and you would then pipe them to bedops --not-element-of 100% to filter them out:

$more A2.bed chr1 80 101 id-0 chr1 100 120 id-1 chr1 110 112 id-2 chr1 114 117 id-3 chr1 125 130 id-4$ ./getNestedElements.awk A2.bed | bedops --not-element-of 100% A2.bed -
chr1    80      101     id-0
chr1    100     120     id-1
chr1    125     130     id-4

0
Entering edit mode

There is only one file to begin with. I tried specifying the file twice for A and B and ended up getting every entry as a duplicate from the original file.

0
Entering edit mode

0
Entering edit mode

Thanks, getting closer. Your example seems to be working flawlessly (I have recreated A.bed and tried the command), but when I try it on my annotation, I lose any features that overlap with each other. Consider your example, if we were to add "chr1 80-101 id-0", we would lose all entries but the last one. We should be keeping id-0, id-1 and id-4, since only id-2 and id-3 are completely covered by another feature, id-0 and id-1 merely overlap but do not cover each other 100%.

0
Entering edit mode

0
Entering edit mode

Eureka! Works amazing. Indeed very important to not forget sort-bed!

0
Entering edit mode

Yes, sorting is very important. It only needs to be done once, however, so hopefully the pain is minimal.