Finding closest downstream gene including overlap 3' ends
2
0
Entering edit mode
5.9 years ago
jfelds11 ▴ 20

I am trying to find the distance to the closest downstream gene. Running into a problem. Some genes overlap in the opposite strand at the 3' end. But I need to use ignore overlap or it just reports my genes back to me since all genes in my -a file are also -b file. Is there a way to report closest downstream gene including if it overlaps on the opposite strand, but make bedtools ignore the fact that all genes in -a are also in -b so it doesn't report 0 for every gene?

Here's two examples with the various ways I've tried. One is where the closest gene overlaps some at the 3' end example one

bedtools closest -a ORFs_1.5termratio_swrvswt_20180608_sort.txt -b steinmetzgenes_sorted.txt -fd -D a | grep YPL260W

chrXVI  49217   51097   YPL260W 0   +   chrXVI  63225   64793   YPL257W 0   +   12129

bedtools closest -a ORFs_1.5termratio_swrvswt_20180608_sort.txt  -b steinmetzgenes_sorted.txt -iu -D a -io | grep YPL260W

chrXVI  49217   51097   YPL260W 0   +   chrXVI  63225   64793   YPL257W 0   +   12129

bedtools closest -a ORFs_1.5termratio_swrvswt_20180608_sort.txt  -b steinmetzgenes_sorted.txt -iu -D a  | grep YPL260W
chrXVI  49217   51097   **YPL260W** 0   +   chrXVI  49217   51097   **YPL260W** 0   +   0

chrXVI  49217   51097   YPL260W 0   +   chrXVI  51013   52701   YPL259C 0   -   0

bedtools closest -a ORFs_1.5termratio_swrvswt_20180608_sort.txt  -b steinmetzgenes_sorted.txt -iu -D a -S | grep YPL260W

chrXVI  49217   51097   YPL260W 0   +   chrXVI  51013   52701   YPL259C 0   -   0

So -S works for this example, but what if the closest gene is on the same strand example two

bedtools closest -a ORFs_1.5termratio_swrvswt_20180608_sort.txt -b steinmetzgenes_sorted.txt -fd -D a | grep YPL239W
chrXVI  99001   100265  YPL239W 0   +   chrXVI  100441  101393  YPL237W 0   +   177

 bedtools closest -a ORFs_1.5termratio_swrvswt_20180608_sort.txt  -b steinmetzgenes_sorted.txt -iu -D a | grep YPL239W 
chrXVI  99001   100265  **YPL239W** 0   +   chrXVI  99001   100265  **YPL239W** 0   +

bedtools closest -a ORFs_1.5termratio_swrvswt_20180608_sort.txt  -b steinmetzgenes_sorted.txt -iu -D a -S | grep YPL239W 
chrXVI  99001   100265  YPL239W 0   +   chrXVI  101397  102997  YPL236C 0   -   1133

This time YPL237W is the closest so can't use -S

bedtools bedops • 1.3k views
ADD COMMENT
1
Entering edit mode

it'd be easier to help if you had posted your full command(s) including a minimal test set (e.g., three or four genes)

ADD REPLY
3
Entering edit mode
5.9 years ago

A few options from BEDOPS closest-features --help may be of interest:

--closest              Choose the closest element for output only.  Ties go the left element.
...
--dist                 Print the signed distances to the <input-file> element as additional
                         columns of output.  An overlapping element has a distance of 0.
...
--no-overlaps          Overlapping elements from <query-file> will not be reported.

The --no-overlaps option, in particular, seems to be most relevant to your question.

Use of --dist is handy when piping to awk, and making decisions based on some logic in the experiment. Use of --closest is like using --dist and picking the feature with the smaller distance.

Hope this helps!

ADD COMMENT
2
Entering edit mode

This worked great! The --no-ref Do not echo elements from <input-file> is the thing I need that bedtools doesn't seem to have. Thanks! I have never used bedops before. It has some nice features.

closest-features --dist --no-ref ORFs_1.5termratio_swrvswt_20180608_sort.txt steinmetzgenes_sorted.txt | grep YPL260W
chrXVI  47153   48857   YPL262W 0   +|0|chrXVI  49217   51097   YPL260W 0   +|361
chrXVI  49217   51097   YPL260W 0   +|0|chrXVI  51013   52701   YPL259C 0   -|0

closest-features --dist --no-ref ORFs_1.5termratio_swrvswt_20180608_sort.txt steinmetzgenes_sorted.txt | grep YPL239W
chrXVI  99001   100265  YPL239W 0   +|0|chrXVI  100441  101393  YPL237W 0   +|177
ADD REPLY
1
Entering edit mode

Thanks for the feedback!

ADD REPLY
0
Entering edit mode

Hello @jfelds11,

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.
Upvote|Bookmark|Accept

ADD REPLY
0
Entering edit mode
5.9 years ago

what about -S: "find the closest feature in B that overlaps A on the _opposite_ strand"

ADD COMMENT
0
Entering edit mode

I tried that, doing -S and then -s, since sometimes the closest gene is the same strand. Think it would work, but then I need a way to take the smaller of the two numbers for each gene. Maybe concatenate the files and use awk somehow?

ADD REPLY

Login before adding your answer.

Traffic: 1944 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6