Non-Random Clusters Of Markers In Genomic Data
1
3
Entering edit mode
12.5 years ago
didymos ▴ 210

I have count data describing how many markers are connected with each chromosome position:

  • [0,0,0,1,0,0,0,2,0,0,0,1,1,....]

However, I have 3 or even 4 orders of magnitude less number of markers than available positions - so I have a lot of zeros.

  • My question is how to find clusters of markers with non-random distribution, e.g. too dense comparing to random positioning?

I have calculated distribution of pair distances between markers and compare it with simulated distances from random distribution, and they are different.
I assume that markers are localize both in random and non-random fashion but I am only interested in non-random clusters.

  • Actually I am even looking into similarity of my problem to other bioinformatic approaches in seq analysis (SNP, HMM in CpG island discovery,... ) for some ideas...
sequence hmm random genomics r • 2.2k views
ADD COMMENT
1
Entering edit mode
12.5 years ago
brentp 24k

This is an interesting problem. I don't have a great solution, but here's what I've tried in the past. Hopefully others have a more rigorous approach...

The distribution of "stuff" in the genome is already clustered so finding other stuff that's clustered in a different fashion is not trivial (or easy, depending on how you look at it).

You could do a moving average of the count data and look for peaks. Then it's a matter of determining a good window size. You could also use bins (overlapping or otherwise) and find those with a high sum. You could then compare that to randomly-generated + binned data.

For more realism, you'll want the randomly generated data to have the same auto-correlation that you expect to see in the genome--whatever that might be. I suppose you could report significance with respect to each level of auto-correlation that you use in generating your random data.

ADD COMMENT

Login before adding your answer.

Traffic: 1471 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6