Biostar Beta. Not for public use.
bedmap to match gene names in bed
0
Entering edit mode
13 months ago
bioguy24 • 190
Chicago

I am trying to match the gene name in a file with a region in a bed file using bepmap, but I can not get it to work. For example, PTPN11 is a gene in the sort_gene.bed, but I do not see it in the output of the bedmap. However, I know that the gene is part of the probes.bed it just doesn't map there in bedmap. Thank you :).

bedmap --echo --skip-unmapped --delim '\t' --echo-map-id-uniq unix_sort_3column_xgen_probes.bed sort_gene.bed > answer.bed

sort_gene.bed
chr12    112418350    112418404    PTPN11
chr12    112418562    112419081    PTPN11
chr12    112418727    112418838    RPL6
chr12    112418908    112419125    PTPN11
chr12    112418913    112419125    PTPN11
chr12    112419111    112419114    PTPN11
chr12    112419111    112419114    PTPN11
chr12    112419111    112419125    PTPN11
chr12    112419111    112419125    PTPN11
chr12    112446275    112446398    PTPN11
chr12    112446275    112446398    PTPN11
chr12    112446275    112446398    PTPN11
chr12    112446275    112446398    PTPN11
chr12    112450317    112450512    PTPN11
chr12    112450317    112450512    PTPN11
chr12    112450317    112450512    PTPN11
chr12    112450317    112450512    PTPN11
chr12    112453194    112453387    PTPN11
chr12    112453194    112453387    PTPN11
chr12    112453194    112453387    PTPN11
chr12    112453194    112453387    PTPN11
chr12    112453328    112453387    PTPN11
chr12    112453328    112453387    PTPN11
chr12    112454563    112454680    PTPN11
chr12    112454563    112454680    PTPN11
chr12    112454563    112454680    PTPN11
chr12    112454563    112454680    PTPN11
chr12    112454563    112454680    PTPN11
chr12    112454563    112454680    PTPN11
chr12    112455949    112456063    PTPN11
chr12    112455949    112456063    PTPN11
chr12    112455949    112456063    PTPN11
chr12    112455949    112456063    PTPN11
chr12    112455949    112456063    PTPN11
chr12    112455949    112456063    PTPN11
chr12    112457290    112457323    PTPN11
chr12    112457290    112457486    PTPN11
chr12    112457323    112457326    PTPN11
chr12    112472943    112473040    PTPN11
chr12    112472943    112473040    PTPN11
chr12    112472943    112473040    PTPN11
chr12    112472943    112473040    PTPN11
chr12    112477650    112477730    PTPN11
chr12    112477650    112477730    PTPN11
chr12    112477650    112477730    PTPN11
chr12    112477650    112477730    PTPN11
chr12    112477856    112478015    PTPN11
chr12    112477856    112478015    PTPN11
chr12    112477856    112478015    PTPN11
chr12    112477856    112478015    PTPN11
chr12    112482073    112482205    PTPN11
chr12    112482073    112482205    PTPN11
chr12    112482073    112482205    PTPN11
chr12    112482073    112482205    PTPN11
chr12    112486474    112486629    PTPN11
chr12    112486474    112486629    PTPN11
chr12    112486474    112486630    PTPN11
chr12    112486474    112486923    PTPN11
chr12    112486630    112486633    PTPN11
chr12    112488442    112488510    PTPN11
chr12    112488442    112488510    PTPN11
chr12    112489023    112489175    PTPN11
chr12    112489023    112489175    PTPN11
chr12    112502143    112502256    PTPN11
chr12    112502143    112502256    PTPN11
chr12    112504694    112504761    PTPN11
chr12    112504694    112504796    PTPN11
chr12    112504761    112504764    PTPN11
chr12    112505824    112509913    PTPN11


unix_3column_xgen_probes.bed  (small subset of file)
chr12    112230476    112230596
chr12    112235871    112235991
chr12    112235936    112236056
chr12    112237699    112237819
chr12    112237757    112237877
chr12    112241652    112241772
chr12    112241667    112241787
chr12    112247303    112247423
chr12    112843303    112843423
chr12    112884064    112884184
chr12    112884097    112884217
chr12    112888106    112888226
chr12    112888211    112888331
chr12    112890983    112891103
chr12    112891086    112891206
chr12    112892352    112892472
chr12    112892379    112892499
chr12    112893738    112893858
chr12    112893762    112893882
chr12    112915434    112915554
chr12    112915645    112915765
chr12    112915714    112915834
chr12    112919862    112919982
chr12    112919904    112920024
chr12    112924263    112924383
chr12    112924328    112924448
chr12    112926220    112926340
chr12    112926812    112926932
chr12    112926874    112926994
chr12    112939932    112940052
chr12    112939955    112940075
chr12    112942473    112942593
bedops • 979 views
ADD COMMENTlink
0
Entering edit mode
ADD REPLYlink
0
Entering edit mode

The genomic regions in your two files do not seem to overlap, therefore, _bedmap_ will not (cannot) by default report any mapped elements.

Did you follow the advice I gave in a previous answer, where you use _bedops --element-of 1_ to test if there are overlaps? See: https://www.biostars.org/p/150703/#150710

I imagine other tools would not be able to repeat what _bedmap_ does, either, unless there are overlaps or data you 're not showing.

If you want to post your files somewhere accessible (like a Public subfolder in a Dropbox account) then I can take a closer look on this end.

ADD REPLYlink
0
Entering edit mode
15 months ago
Seattle, WA USA

Mapping only happens where there are at least one-base overlaps. You only get results when there are mapped elements. If there are no mapped elements, you do not get any IDs or other values.

The threshold of overlap can be adjusted, and input elements can be padded, but the default usage -- your current _bedmap_ command -- requires only one or more bases of overlap.

You can run some ad-hoc tests with _bedops --element-of_ to verify whether there are or are not overlaps:

$ bedops --element-of 1 sort_gene.bed unix_3column_xgen_probes.bed > test.bed

The file test.bed shows any elements of sort_gene.bed that overlap elements in unix_3column_xgen_probes.bed.

You can use this to verify where there are and are not minimally-one-base overlaps.

ADD COMMENTlink
0
Entering edit mode
15 months ago
Seattle, WA USA

Another piece of advice is to validate your input files, especially if they come from Excel.

Use _bedops --ec --everything_ and _cat -et_ to validate that input is sorted correctly, that it is tab-delimited, and that it doesn 't contain weird, strange Microsoft characters.

$ bedops --ec --everything sort_gene.bed > /dev/null
...
$ cat -et sort_gene.bed | head
...

Repeat for unix_3column_xgen_probes.bed, etc.

If an error is that an input file needs sorting, use BEDOPS _sort-bed_. It is faster than GNU _sort_ at sorting BED files.

ADD COMMENTlink
0
Entering edit mode

I will use your great advice and post back. Thank you for your help :).

EDIT: I have the results and it appears that PTPN11 is not represented in the xgen_targets.bed, but I fing this hard to believe. I validated the inputs and sorted them and then tested for overlaps. Is it possible to post the files on dropbox or box.net to take a look at? Thank you :).

ADD REPLYlink
0
Entering edit mode

Sure, just sign up for a Dropbox or other account and put your files in a public directory. Then copy and paste the links from that directory into a comment here (right-click on the file to copy the public-facing web address or URL to the clipboard).

ADD REPLYlink
0
Entering edit mode

I placed all the files here at box.net and one of the files Alex.txt has a explanation in it: Thank you :).

https://app.box.com/s/mwsz3brh2ltwd4ze3307qb09ur36rbsj

ADD REPLYlink
0
Entering edit mode

Did you validate any of the inputs?

I ran the following and it looks like one (and possibly more) of your inputs is unsorted:

Papillion:Alex alexpreynolds$ bedops --everything --ec intersect_epilepsy70.bed > /dev/null
May use bedops --help for more help.

Error: in intersect_epilepsy70.bed
Bed file not properly sorted by start coordinates.
See row: 422

Make sure your inputs are sorted before applying operations. Use BEDOPS _sort-bed_ to sort inputs.

Other than using the advice I gave to you to validate your inputs, you need overlaps between (validated) inputs to get a result with _bedmap_.

Maybe visualize your inputs with custom tracks in the UCSC Genome Browser and you can see for yourself what overlaps do and do not exist.

ADD REPLYlink
0
Entering edit mode

I'm not sure what file xgen_targets.bed represents as you don't mention it in your original question, but feel free to post that file to Dropbox as well, and please show how you are referencing it or using it in a _bedmap_ operation.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1