Probabilty Question: Finding The Same Dna Fragment In 2 Different Genomes
1
2
Entering edit mode
10.4 years ago
arnstrm ★ 1.8k

Hi all,

I have a question about probability.

Let us say we have a DNA fragment of length 20 bp and I have 2 genomes of approximately same size (150mb). What is the probability of finding the same fragment in 2 genomes? If I decide to include mis-matches, how do I account for probability ( will it be as simple as 20 bp - number of mis-matches or do I have to account for all the '20 choose 5' combinations?)?

My calculations so far: since there can be a total of 4^20 randomly occurring 20bp fragments, the probability of finding a particular 20bp fragment is 1/(4^20), but I don't know how the probability of finding a fragment in a genome of 15E7 bp is (1/(4^20))^(15E7) (from this Probability Of Finding A Dna Sequence In A Window)?

Any help will be greatly appreciated!

Thanks,

comparative • 3.0k views
ADD COMMENT
2
Entering edit mode
10.4 years ago

I think Istvan's approach from your quoted thread is the right one for you too.

There are 150 million 20-mers in the genome, the odds of a given 20-mer not matching is 1 - 1/(4^20), (something like .9999999999999999999999999999...), but the odds of all 150 million sequences not being the right one are (1 - 1/(4^20)) ^ 150 million. Monkeys, Shakespeare, etc.

A real genome is not totally random, so your real odds will depend on what the target sequence is.

But all of that hinges on your two sequences being genuinely independent of each other. Unless you have extraterrestrial DNA, that's not the case.

ADD COMMENT
0
Entering edit mode

Thanks for the explanation. Do you have any insights on finding odds with mis-matches?

ADD REPLY
0
Entering edit mode

Look up the binomial theory. The 20th level of Pascal's Triangle would be helpful too.

ADD REPLY

Login before adding your answer.

Traffic: 2013 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6