Pulling out interval adjoining regions
5
0
Entering edit mode
5.4 years ago
rbronste ▴ 420

Looking for a good way to take a set of intervals and print out an interval set (bed file) that represents regions just upstream and downstream of every interval in the original file, lets say 10kb up and downstream. Any help appreciated, thanks!

bedtools bedops bed interval • 1.2k views
ADD COMMENT
3
Entering edit mode
5.4 years ago

As with most interval operations, bedtools has a command for it:

https://bedtools.readthedocs.io/en/latest/content/tools/flank.html

ADD COMMENT
0
Entering edit mode

I didn't know that one, thanks !

ADD REPLY
1
Entering edit mode
5.4 years ago
 awk -F '\t' '{X=10000; B=int($2);E=int($3);printf("%s\t%d\t%d\n%s\t%d\t%d\n",$1,B-X<0?0:B-X,B,$1,E,E+X);}'
ADD COMMENT
1
Entering edit mode
5.4 years ago
bernatgel ★ 3.4k

If you are using R, you can do it with the flank function in GenomicRanges. It takes into account the chromosome lengths, if present.

https://bioconductor.org/packages/3.7/bioc/vignettes/GenomicRanges/inst/doc/GenomicRangesIntroduction.pdf

ADD COMMENT
1
Entering edit mode
5.4 years ago

Yep, just use BEDOPS bedmap --range to map padded elements:

$ bedmap --skip-unmapped --echo-map --range 10000 reference.map map.bed | awk '(!a[$0]++)' | sort-bed - > answer.bed

We use awk to strip duplicates from unsorted results. Sorting is necessary because we use --echo-map, where mapped elements can be returned out of order.

The file answer.bed will contain unique elements from map.bed that overlap elements from a 10kb-padded version of reference.bed.

ADD COMMENT
1
Entering edit mode
5.4 years ago

Here's another approach that uses bedops --range:

$ bedops --merge reference.bed | bedops --range 10000 - | bedops --element-of 1 map.bed - > answer.bed

The file answer.bed will contain unique elements from map.bed that overlap elements from a 10kb-padded version of reference.bed. Adjust padding, as needed.

Merging the reference intervals before padding should handle overlaps, which avoids the need to filter duplicates and resort. So this should work faster than using bedmap --range, I think.

ADD COMMENT

Login before adding your answer.

Traffic: 1526 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6