Genomic Regions To Exclude Before Shuffling Intervals
1
4
Entering edit mode
10.4 years ago
PoGibas 5.1k

I want to do permutation test: randomly reposit (shuffle) given genomic intervals and measure intersection between new coordinates and specific genomic element.

Example:

  • Different sets of genes: protein coding, pseudogenes, ncRNA - intervals that I want to shuffle;
    Genomic repeat L1 - coordinates are stable.
  • For every gene set shuffle intervals, intersect and measure the overlap with L1 (I am using bedtools shuffle - "reposition each feature in the input BED file on a random chromosome at a random position").

Question - Which genomic regions to exclude from the "genome" (bedtools shuffle -g option) before shuffling gene intervals?
I was going to exclude gaps in the assembly.
But what about:

  • All gene regions.
    If I am shuffling pseudogene intervals should I exclude protein coding and ncRNA coordinates?
  • All non L1 Repeat masker coordinates.
    As alu, LTR and DNA transposons aren't L1 so their won't be any intersection with them?
bedtools • 4.0k views
ADD COMMENT
0
Entering edit mode

I am a bit confused about what you are trying to do here. You want to pick genomic coordinates at random (do you mean intervals? coordinates are a fixed point, intervals require two points) and see if they overlap with repeats (L1)? In your example it seems like you have several types of genomic elements and you are going to pick some at random and see if they overlap repeats? What do you mean by shuffling? Are you going to be keeping the width of each element the same and shift them around the genome and you want to exclude all functional regions?

ADD REPLY
0
Entering edit mode

I edited my question.

ADD REPLY
6
Entering edit mode
10.4 years ago

Greetings,

I have done this for our paper:

http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1003470

I excluded gaps only. I suppose it comes down to what you are testing. We wanted to establish a null hypothesis about how close TEs were to non-coding RNAs.

I would suggest trying both directions. Shuffle the genes and test for overlap, then shuffle TE/Lines and test for overlap.

I also found GenometriCorr useful

http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1002529

ADD COMMENT

Login before adding your answer.

Traffic: 2461 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6