Biostar Beta. Not for public use.
Question: bedtools merge - removing features completly overlapped by other features
0
Entering edit mode

Hello,

Assuming we start with only one bed file, example:

F1 ------------------------- 
F2     ----
F3                ----
F4                            --------------

Default bedtools merge behaviour will output one sole feature spanning from F1 start to F4 end, since F1 and F4 overlap.

Is it possible to remove features that are completely covered by any other feature? In other words, output should be F1 and F4.

Thanks.

EDIT: I suppose another way to ask this is "How can I output features from one file that do not overlap 100% any other features in the same file?"

ADD COMMENTlink 2.4 years ago Adrian Pelin ♦ 2.3k
1
Entering edit mode

Here's a one-liner with another toolkit:

$ bedops -m A.bed B.bed | bedmap --echo-map --exact - <(bedops -u A.bed B.bed) | sort-bed - > answer.bed

Or if you only have one file:

$ bedops -m C.bed | bedmap --echo-map --exact - C.bed > answer.bed

It is assumed A.bed and B.bed and C.bed are sorted before set operations, i.e.:

$ sort-bed A.unsorted.bed > A.bed
$ sort-bed B.unsorted.bed > B.bed
$ sort-bed C.unsorted.bed > C.bed

Demo:

$ more C.bed
chr1    100     120     id-1
chr1    110     112     id-2
chr1    114     117     id-3
chr1    125     130     id-4

$ bedops -m C.bed | bedmap --echo-map --exact - C.bed
chr1    100 120 id-1
chr1    125 130 id-4

Another option is to filter the input file for purely nested elements.

Nested elements are a pain, but there are ways to deal with them.

Start with a utility awk script:

#!/usr/bin/env awk -f
{
    if (NR > 1) {
        currentChr = $1
        currentStart = $2
        currentStop = $3
        if ((previousStart < currentStart) && (previousStop > currentStop)) {
            print $0;
        }
        else {
            previousChr = currentChr
            previousStart = currentStart
            previousStop = currentStop
        }
    }
    else {
        previousChr = $1
        previousStart = $2
        previousStop = $3
    }
}

Let's call that script getNestedElements.awk. Make sure it is executable (chmod u+x ./getNestedElements.awk).

You can use this script to identify elements in A2.bed that are nested, and you would then pipe them to bedops --not-element-of 100% to filter them out:

$ more A2.bed
chr1    80      101     id-0
chr1    100     120     id-1
chr1    110     112     id-2
chr1    114     117     id-3
chr1    125     130     id-4

$ ./getNestedElements.awk A2.bed | bedops --not-element-of 100% A2.bed -
chr1    80      101     id-0
chr1    100     120     id-1
chr1    125     130     id-4
ADD COMMENTlink 2.4 years ago Alex Reynolds 28k
Entering edit mode
0

There is only one file to begin with. I tried specifying the file twice for A and B and ended up getting every entry as a duplicate from the original file.

ADD REPLYlink 2.4 years ago
Adrian Pelin
♦ 2.3k
Entering edit mode
0

Pls see revised answer.

ADD REPLYlink 2.4 years ago
Alex Reynolds
28k
Entering edit mode
0

Thanks, getting closer. Your example seems to be working flawlessly (I have recreated A.bed and tried the command), but when I try it on my annotation, I lose any features that overlap with each other. Consider your example, if we were to add "chr1 80-101 id-0", we would lose all entries but the last one. We should be keeping id-0, id-1 and id-4, since only id-2 and id-3 are completely covered by another feature, id-0 and id-1 merely overlap but do not cover each other 100%.

ADD REPLYlink 2.4 years ago
Adrian Pelin
♦ 2.3k
Entering edit mode
0

Pls see revised answer.

ADD REPLYlink 2.4 years ago
Alex Reynolds
28k
Entering edit mode
0

Eureka! Works amazing. Indeed very important to not forget sort-bed!

ADD REPLYlink 2.4 years ago
Adrian Pelin
♦ 2.3k
Entering edit mode
0

Yes, sorting is very important. It only needs to be done once, however, so hopefully the pain is minimal.

ADD REPLYlink 2.4 years ago
Alex Reynolds
28k

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0