Biostar Beta. Not for public use.
Selecting Gene Subsets Such That No Two Genes Occur In The Same Set More Than "T" Times
0
Entering edit mode
6.1 years ago

In one of my gene sets, I am interested in selecting gene subsets such that no two genes occur in the same set more than "t" times. Mathematically, given n objects and selecting k objects without replacement, how many combinations exist such that no two objects are in the same group more than "t" times?

E.g, If we have five genes A, B, C, D, E and pick 5 choose 3 = 10 combinations with t =2: But gene combination A B is present three times and hence out of three possibilities

A B C

A B D

A B E

I will select only 2 (=t) combinations randomly.

Is there a closed form solution for finding out how many combinations exist and listing them out for say n = 40, k=5, and t=3?

gene • 1.0k views
0
Entering edit mode
12 months ago
Leandro Lima • 920
San Francisco, CA

I think that is C(40,5) - C(38,3) + t

0
Entering edit mode

Thanks, but I would also appreciate some explanation of your solution. How did you arrive at this?

0
Entering edit mode

Sorry. I thought a bit more and realized that it is not that simple. Do you really need a closed form?

Maybe it is easier to simulate and count.

0
Entering edit mode

My previous formula is correct only if you have a fixed pair.

0
Entering edit mode
6.1 years ago
KCC ♦ 3.9k
Cambridge, MA

I think mathematically, one tackles a problem like this in this way: Expand this (1+x+x^2+...+x^t)^n and then looking for the x^k component in the final expanded product. Let me know if you need more explanation than the following. (It's a little challenging to express mathematics without LaTeX.)

Essentially, each copy of (1+x+ ... + x^t) represents each gene. Let me explain the encoding. In your example with 5 genes:

If we have five genes A, B, C, D, E and pick 5 choose 3 = 10 combinations with t=2".

There would be five copies of (1+x+x^2). The selection "ABD" is encoded by picking x in the first copy of (1+x+x^2), then x in the next, then 1, then x, then 1. Note that x _x_ 1 _x_ 1 = x^3.

You can get a general formula by expressing your multinomials in terms of combinations using the closed forms found on this page: http://en.wikipedia.org/wiki/Multinomial_theorem

I recently found an approach similar to what I was discussing in this question, that's better explained.

Note that they use Wolfram alpha to get the coefficient they are looking for. You could probably use either a recursive algorithm or dynamic programming to design a program that evaluates a minimal number of products in order to get the coefficient you are interested in.