Question

bedtools complement error

0

Entering edit mode

5.4 years ago

bk11 ★ 2.3k

Hi I have an error from bedtools. What might be happening? I have two bed files:

cat A.bed
chr1  100  200
chr1  400  500
chr1  500  800

cat my.genome
chr1  1000
chr2  800

when I run this:

bedtools complement -i A.bed -g your.genome

It gives

Error: The genome file your.genome has no valid entries. Exiting.

bedtools • 3.5k views

ADD COMMENT • link updated 5.4 years ago by Alex Reynolds 35k • written 5.4 years ago by bk11 ★ 2.3k

score 0 · Answer 1 · 2018-12-03

0

Entering edit mode

5.4 years ago

ATpoint 81k

your.genome must be tab-delimited.

ADD COMMENT • link 5.4 years ago by ATpoint 81k

0

Entering edit mode

I changed it into tab-delimited and still does not work.

sed 's/ /\t/g' my.genome >my.genome1
cat my.genome1

chr1        1000
chr2        800

ADD REPLY • link 5.4 years ago by bk11 ★ 2.3k

0

Entering edit mode

If your files were tab-delimited, it would work. You probably substituted the wrong delimiter in your sedcommand. Probably it is a double-whitespace or something, and after your command you now have a hybrid tab-whitespace delimiter.

enter image description here

ADD REPLY • link 5.4 years ago by ATpoint 81k

0

Entering edit mode

Could you please show your command lines who you generated bed files? I am still having problem.

ADD REPLY • link 5.4 years ago by bk11 ★ 2.3k

0

Entering edit mode

In this case I simply did it manually by tiping it in a text editor. What organism are you working on? There are genome.sizes files available for download for most species.

ADD REPLY • link 5.4 years ago by ATpoint 81k

0

Entering edit mode

Try replacing all [[:space:]]+ with \t. That should work.

ADD REPLY • link 5.4 years ago by Ram 43k

ATpoint · Answer 2 · 2018-12-03

0

Entering edit mode

5.4 years ago

Alex Reynolds 35k

Here's a one-liner that should work:

$ bedops --complement <( sort-bed A.bed ) <( awk -v OFS="\t" '{ print $1, "0", $2 }' my.genome | sort-bed - )  > answer.bed

This part is called a process substitution in the bash shell:

... <( awk -v OFS="\t" '{ print $1, "0", $2 }' my.genome | sort-bed - ) ...

It uses awk to turn the file my.genome into a sorted BED file, on which you can do set operations with bedops. Basically, everything within <( ... ) returns operational intervals that are fed to the bedops process as a standard input stream.

Here's what the one-liner looks like when broken down into separate commands:

$ sort-bed A.bed > A.sorted.bed
$ awk -v OFS="\t" '{ print $1, "0", $2 }' my.genome | sort-bed - > my.genome.sorted.bed
$ bedops --complement A.sorted.bed my.genome.sorted.bed > answer.bed
$ rm A.sorted.bed my.genome.sorted.bed

Process substitutions might look a little odd, at first, but they help avoid creating intermediate files, which slow down operations on whole-genome scale work. Intermediate files also require disk space and need cleaning up. It's useful to avoid intermediate files, when possible.

ADD COMMENT • link updated 5.4 years ago by ATpoint 81k • written 5.4 years ago by Alex Reynolds 35k

0

Entering edit mode

I added a whitespace between all awk -v and OFS=. Hope you don't mind :)

ADD REPLY • link 5.4 years ago by ATpoint 81k

0

Entering edit mode

Hi @Alex Reynolds,

I was just exploring the bedpos --complement option instead of bedtools complement option to get coordinates which are present in genome.bed but not in target.bed, I found --difference is more suitable to get the result.bed, instead of --complement.

Do correct me if my understanding of the tool is wrong.

Thanks!

ADD REPLY • link updated 4.9 years ago by Ram 43k • written 4.9 years ago by bioinfo89 ▴ 50

1

Entering edit mode

The manual states:

The --complement operation calculates the genomic regions in the gaps between the contiguous per-chromosome ranges defined by one or more inputs.

The --difference operation calculates the genomic regions found within the first (reference) input file, excluding regions in all other input files

Clearly, an operation of the sort A.bed minus B.bed requires the use of --difference as --complement works on a completely different problem.

ADD REPLY • link 4.9 years ago by Ram 43k