bedtools strategy to report both merged AND unmerged intervals
2
0
Entering edit mode
5.7 years ago
Anand Rao ▴ 630

I have 2 GFF3 files that I want to analyze and report the following:

type 1 - merge and report adjacent intervals (up to 2000bp away) coords, using bedtools merge

type 2 - report unmerged coords as they are - so intact GFF3 lines from respective files

I achieved step 1 simply using bedtools merge. But it does not report lines of type 2 (described above)

How do I report both merged and unmerged lines? Thanks!

bedtools merge unmerged gff3 • 1.6k views
ADD COMMENT
0
Entering edit mode
5.7 years ago
$ bedops --merge --range 2000 <(gff2bed < foo.gff) > merged.bed
$ bedmap --count --echo-map --delim '\t' merged.bed <(gff2bed < foo.gff) | awk '($1>0)' | cut -f2- | sort-bed - | bedops -n 1 <(gff2bed < foo.gff) - > unmerged.bed

Or, in one pass, using standard IO streams:

$ bedops --merge --range 2000 <(gff2bed < foo.gff) | bedmap --count --echo-map --delim '\t' - <(gff2bed < foo.gff) | awk '($1>0)' | cut -f2- | sort-bed - | bedops -n 1 <(gff2bed < foo.gff) - > unmerged.bed

Might be faster to convert once:

$ gff2bed < foo.gff > foo.bed

Then do the ops:

$ bedops --merge --range 2000 foo.bed | bedmap --count --echo-map --delim '\t' - foo.bed | awk '($1>0)' | cut -f2- | sort-bed - | bedops -n 1 foo.bed - > unmerged.bed
ADD COMMENT
0
Entering edit mode
5.7 years ago
Anand Rao ▴ 630

Thank you, Alex. I tried your syntax, first converting GFF to BED, simply using gff2bed. And then,

bedops --merge foo.bed | bedmap --count --echo-map - foo.bed | awk '($1>0)' | cut -f2- | sort-bed - | bedops -n 1 foo.bed - > unmerged.bed

I received an error message that reads:

End coordinate is too large. Max decimal digits allowed is 12 in BEDOPS.Constants.hpp. See line 1 in -.

Perhaps my BED input file is non-standard? For example, see below:

bedops --merge foo.bed | bedmap --count --echo-map - foo.bed | awk '($1>0)' | cut -f2- | head -n 1

837036 877324 ORF68_tool_Prfl1 8 - tool_Prfl1 ORF 2 ID=ORF68_ptool_Prfl1

Error message looks like it is generate at the sort-bed using STDIN stream step...

bedops --merge foo.bed | bedmap --count --echo-map - foo.bed | awk '($1>0)' | cut -f2- | sort-bed -

So I've tried reverting back to bedtools merge and it looks like it does report BOTH merged and unmerged coords in the same output file. I am now making sure I am not mis-interpreting mergeBed with bleary eyes, using examples and real data!

ADD COMMENT
1
Entering edit mode

I left out a --delim '\t' option in the bedmap command. I edited my answer to include that option. I think that should fix it. Feel free to follow up if that doesn't help.

ADD REPLY

Login before adding your answer.

Traffic: 1940 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6