Biostar Beta. Not for public use.
bedop: output overlap for each region in a file
0
Entering edit mode
16 months ago
User 7754 • 230
United Kingdom

I have a sorted large BED file (main.bed) and several regions defined in a file (regions.bed) which can be overlapping with each other. I am trying to find a quick way to output the overlaps from the main BED file with the regions defined in regions.bed, and create a separate BED output file for each of these regions.

"bedops --everything" seems as a good option, but is there a way to output different intersects between main.bed and regions.bed in a different file for each line of regions.bed?

Thank you!
Fra

BEDOP overlap • 1.1k views
ADD COMMENTlink
1
Entering edit mode
13 months ago
WCIP | Glasgow | UK

I assume your "regions.bed" file is not huge otherwise you would end up with tons of files. You could go through regions.bed line by line and output the result. Here's an example using bedtools, but the same idea should be doable in bedops:

n=1
while read line
do
echo "$line" | intersectBed -a main.bed -b - -wa > region.${n}.bed
n=$((n+1))
done < regions.bed

(It's a bit unusual what you are trying to do, it might be that there are better ways of structuring your data)

Edit : Maybe this is better then having one file per interval:

intersectBed -a main.bed -b regions.bed -wa -wb \
| awk -v OFS='\t' '{print $1, $2, $3, $4, $5"_"$6"_"$7}'

chr11    13302    13303    1    chr11_13202_14981
chr11    13327    13328    1    chr11_13202_14981
chr11    13980    13981    1    chr11_13202_14981
chr11    13980    13981    1    chr11_13980_41480
chr11    30923    30924    1    chr11_13980_41480
chr11    30923    30924    1    chr11_30923_51480
chr11    51476    51477    1    chr11_30923_51480
chr11    51479    51480    1    chr11_30923_51480

Now the 5th column is an identifier of each interval in regions.bed.

ADD COMMENTlink
0
Entering edit mode

Thank you dariober, this is what I had tried before using loops but I was hoping there was a faster solution since my regions/files are around 1,000 (they are temp files, meaning I use them and remove them right after, it is just to load them into R)...

ADD REPLYlink
0
Entering edit mode

Hi - See if my edit helps.

ADD REPLYlink
1
Entering edit mode
15 months ago
Seattle, WA USA

You could use the following script as a general solution. You could either use BEDOPS _bedmap --echo-map --multi-delim '\n'_ or _bedops -e 1_ , depending on what detail of data you need. In _bedmap_ , you could pipe in each region/line as a reference element via stdin, using main for map elements. In _bedops_ you could use the main for reference and each region element as a map, each region/line piped in via stdin.

ADD COMMENTlink
0
Entering edit mode
15 months ago
Seattle, WA USA

You could pipe the results of BEDOPS _bedmap_ to _split_ :

$ bedmap --echo --echo-map regions.bed main.bed | split -l 1 -a 5 - bedmap_result_

For each line in _regions.bed_ , you would get files called _bedmap_result_aaaaa_ , _bedmap_result_aaaab_ , and so on.

See _man split_ for more information on the _-l_ and _-a_ options used in this example.

ADD COMMENTlink
0
Entering edit mode
16 months ago
User 7754 • 230
United Kingdom

Thank you Alex for your reply.

This is not giving me what I was looking for though, it gives me many output files with 1 line indicating the overlap between the two bed files, but what I was hoping to get is, in each output file, the subset of overlaps in main.bed that are included in each region defined by regions.bed. So I am looking to obtain one output file for each region which should contain many entries since main.bed is based on a 1 base pair while regions is many base pairs. This will also mean that several entries in the output files will be duplicated with other output files since the regions can be overlapping... Is this possible to do?

One example is:

cat main.bed

chr11    13302   13303   1 

chr11    13327   13328   1 

chr11 13980 13981 1

chr11    30923   30924   1  

chr11    51476   51477   1  

chr11    51479   51480   1   

cat regions.bed:

chr11   13202      14981       2     

chr11   13980        41480        2      

chr11    30923        51480        2      

And the output should be:

cat res.region1

chr11    13302   13303   1 

chr11    13327   13328   1 

chr11 13980 13981 1

cat res.region2

chr11 13980 13981 1

chr11    30923   30924   1  

cat res.region3

chr11    30923   30924   1  

chr11    51476   51477   1  

chr11    51479   51480   1   
ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1