Statistical distribution of viral genomes
1
0
Entering edit mode
2.9 years ago
Carla • 0

Hi! I have some reads from sequencing a viral genome. We are analyzing readings with two matches, and we calculate the existing distance between both matches, for which we carry out histograms. I would like to know what statistical distribution we would expect to obtain.

genome virus statistics gauss distribution • 666 views
ADD COMMENT
0
Entering edit mode
2.9 years ago
Michael 54k

We can only answer the given question for the case that the two locations are independent of each other. Let's simplify the question: we are just concerned about the start positions of the alignments and these are drawn randomly and independently with uniform probabilities for each base position, further let the genome size be constant. Then we expect to see a "staircase" or "pyramid" distribution histogram, or a linear probability density function with maximum at 0 and minimum at length of genome.

You can do a simulation in R fairly easily by this code, assuming genome length is 1000:

hist(abs(as.integer(runif(100000,0,1000)) - as.integer(runif(100000,0,1000))))

We cannot tell anything about the dependent case because it could take any possible functional form. Therefore, any dependent case needs to be established as a deviation from the independence assumption, and the independent case becomes the null-hypothesis.

ADD COMMENT
0
Entering edit mode

Hi! Thank you, In this case, how can I distinguish if the regions that do not match (cuts) are due to chance or not?

ADD REPLY
0
Entering edit mode

Difficult question, as you see from the histogram, low distance is the norm here already. For any individual pair, I would say it is impossible to know. For a distribution over many pairs it might be possible to calculate if their distance is significantly larger than expected. In case the genome length is very large in comparison to the expected distance of reads, it should also be possible to do that.

ADD REPLY

Login before adding your answer.

Traffic: 3130 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6