Selecting Gene Subsets Such That No Two Genes Occur In The Same Set More Than "T" Times
2
0
Entering edit mode
10.9 years ago

In one of my gene sets, I am interested in selecting gene subsets such that no two genes occur in the same set more than "t" times. Mathematically, given n objects and selecting k objects without replacement, how many combinations exist such that no two objects are in the same group more than "t" times?

E.g, If we have five genes A, B, C, D, E and pick 5 choose 3 = 10 combinations with t =2: But gene combination A B is present three times and hence out of three possibilities

A B C

A B D

A B E

I will select only 2 (=t) combinations randomly.

Is there a closed form solution for finding out how many combinations exist and listing them out for say n = 40, k=5, and t=3?

gene • 2.0k views
ADD COMMENT
0
Entering edit mode
10.9 years ago
Leandro Lima ▴ 970

I think that is C(40,5) - C(38,3) + t

ADD COMMENT
0
Entering edit mode

Thanks, but I would also appreciate some explanation of your solution. How did you arrive at this?

ADD REPLY
0
Entering edit mode

Sorry. I thought a bit more and realized that it is not that simple. Do you really need a closed form?

Maybe it is easier to simulate and count.

ADD REPLY
0
Entering edit mode

My previous formula is correct only if you have a fixed pair.

ADD REPLY
0
Entering edit mode
10.9 years ago
KCC ★ 4.1k

I think mathematically, one tackles a problem like this in this way: Expand this (1+x+x^2+...+x^t)^n and then looking for the x^k component in the final expanded product. Let me know if you need more explanation than the following. (It's a little challenging to express mathematics without LaTeX.)

Essentially, each copy of (1+x+ ... + x^t) represents each gene. Let me explain the encoding. In your example with 5 genes:

If we have five genes A, B, C, D, E and pick 5 choose 3 = 10 combinations with t=2".

There would be five copies of (1+x+x^2). The selection "ABD" is encoded by picking x in the first copy of (1+x+x^2), then x in the next, then 1, then x, then 1. Note that xx1x1 = x^3.

You can get a general formula by expressing your multinomials in terms of combinations using the closed forms found on this page: http://en.wikipedia.org/wiki/Multinomial_theorem

I recently found an approach similar to what I was discussing in this question, that's better explained.

Note that they use Wolfram alpha to get the coefficient they are looking for. You could probably use either a recursive algorithm or dynamic programming to design a program that evaluates a minimal number of products in order to get the coefficient you are interested in.

ADD COMMENT

Login before adding your answer.

Traffic: 2480 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6