Biostar Beta. Not for public use.
bedtools merge - removing features completly overlapped by other features
0
Entering edit mode
3.4 years ago
Adrian Pelin ♦ 2.3k
@Adrian Pelin9259

Hello,

Assuming we start with only one bed file, example:

F1 ------------------------- 
F2     ----
F3                ----
F4                            --------------

Default bedtools merge behaviour will output one sole feature spanning from F1 start to F4 end, since F1 and F4 overlap.

Is it possible to remove features that are completely covered by any other feature? In other words, output should be F1 and F4.

Thanks.

EDIT: I suppose another way to ask this is "How can I output features from one file that do not overlap 100% any other features in the same file?"

bedtools • 1.0k views
ADD COMMENTlink
1
Entering edit mode
3.4 years ago
@Alex Reynolds20

Here's a one-liner with another toolkit:

$ bedops -m A.bed B.bed | bedmap --echo-map --exact - <(bedops -u A.bed B.bed) | sort-bed - > answer.bed

Or if you only have one file:

$ bedops -m C.bed | bedmap --echo-map --exact - C.bed > answer.bed

It is assumed A.bed and B.bed and C.bed are sorted before set operations, i.e.:

$ sort-bed A.unsorted.bed > A.bed
$ sort-bed B.unsorted.bed > B.bed
$ sort-bed C.unsorted.bed > C.bed

Demo:

$ more C.bed
chr1    100     120     id-1
chr1    110     112     id-2
chr1    114     117     id-3
chr1    125     130     id-4

$ bedops -m C.bed | bedmap --echo-map --exact - C.bed
chr1    100 120 id-1
chr1    125 130 id-4

Another option is to filter the input file for purely nested elements.

Nested elements are a pain, but there are ways to deal with them.

Start with a utility awk script:

#!/usr/bin/env awk -f
{
    if (NR > 1) {
        currentChr = $1
        currentStart = $2
        currentStop = $3
        if ((previousStart < currentStart) && (previousStop > currentStop)) {
            print $0;
        }
        else {
            previousChr = currentChr
            previousStart = currentStart
            previousStop = currentStop
        }
    }
    else {
        previousChr = $1
        previousStart = $2
        previousStop = $3
    }
}

Let's call that script getNestedElements.awk. Make sure it is executable (chmod u+x ./getNestedElements.awk).

You can use this script to identify elements in A2.bed that are nested, and you would then pipe them to bedops --not-element-of 100% to filter them out:

$ more A2.bed
chr1    80      101     id-0
chr1    100     120     id-1
chr1    110     112     id-2
chr1    114     117     id-3
chr1    125     130     id-4

$ ./getNestedElements.awk A2.bed | bedops --not-element-of 100% A2.bed -
chr1    80      101     id-0
chr1    100     120     id-1
chr1    125     130     id-4
ADD COMMENTlink
0
Entering edit mode

There is only one file to begin with. I tried specifying the file twice for A and B and ended up getting every entry as a duplicate from the original file.

ADD REPLYlink
0
Entering edit mode

Pls see revised answer.

ADD REPLYlink
0
Entering edit mode

Thanks, getting closer. Your example seems to be working flawlessly (I have recreated A.bed and tried the command), but when I try it on my annotation, I lose any features that overlap with each other. Consider your example, if we were to add "chr1 80-101 id-0", we would lose all entries but the last one. We should be keeping id-0, id-1 and id-4, since only id-2 and id-3 are completely covered by another feature, id-0 and id-1 merely overlap but do not cover each other 100%.

ADD REPLYlink
0
Entering edit mode

Pls see revised answer.

ADD REPLYlink
0
Entering edit mode

Eureka! Works amazing. Indeed very important to not forget sort-bed!

ADD REPLYlink
0
Entering edit mode

Yes, sorting is very important. It only needs to be done once, however, so hopefully the pain is minimal.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.3