Detecting Different Amino Acid Composition Between Two Bacterial Genomes?
1
1
Entering edit mode
11.5 years ago

I was asked this question, which I hereby pass on: Is there a standard method to test if the amino acid composition is significantly different between two bacterial genomes?

I guess the question is a bit tricky because significant could be statistically significant - or biologically significant.

genome • 3.7k views
ADD COMMENT
0
Entering edit mode

This is an ambiguous question in more ways. Even if you chose one of the terms what should statistically or biologically significant mean? When you get questions like this the best is to turn it right back to the originator. I found that most of the time they don't know what they mean. So now your left with trying to come up an answer to an unspecified question.

ADD REPLY
0
Entering edit mode

@Istvan, I totally agree that the question is not precise enough, but then the role of the bioinformagician is to improve this type of question. I am interested if anyone have experience with the subject.

ADD REPLY
0
Entering edit mode

@Martin: first of, thanks for posting the question to this interesting and very useful media. Since I suspect I am the originator of the question I will try to elaborate. Amino acid frequencies do not have a random and uniform distribution in proteomes and therefore my question is how to find out whether there is a statistical significant difference between the frequency of an amino acid in one organism compared to the other organism, knowing that that same amino acid might have a significantly different frequency compared to the rest of the amino acids in the same organism due to biological reasons. I tried using the contigency table, but since amino acids are not randomly distributed I figured that it did not give me the right answer. Then I tried with calculating the difference in percentages of single amino acids in one organism compared to the other and the mean and std.dev. of that, and used mean+std.dev. to detect for significantly large deviations, whatever that means.. This gave the biological results I wanted but I don't know if this can be used to detect statistical significance. Would it be possible to use the frequencies of amino acids in one organism as the expected frequency and the frequency of the amino acids in the other organism as the observed and then use Chi square? Is that maybe what has been done in the answer below? I was not aware that this question was such a complicated matter and I will be thankful for any input.

ADD REPLY
8
Entering edit mode
11.5 years ago

I would use Chi square test.

Assume you have amino acid compositions for two bacterial genomes in csv file: comp.csv.

G, A, C, F, I, L, M, V, W, Y, R, K, H, N, P, Q, S, T, D, E
composition1, 36.2, 6, 0.3, 2.4, 0.7, 1.8, 0.7, 2.7, 0.5, 7.8, 9.2, 1.8, 0.6, 3.6, 2.6, 1.9, 9.7, 1.8, 6.2, 3.5
composition2, 21.3, 6.5, 0.4, 1.4, 0.7, 1.1, 0.3, 2.3, 6.3, 0.4, 4.4, 7, 0.6, 8.1, 3.2, 3.8, 15.2, 4, 8.1, 4.8

Use R for calculations:

d <- read.csv("comp.csv",header=T,sep=",")
chisq.test(d)

You will get:

Pearson's Chi-squared test

data:  d 
X-squared = 25.8429, df = 19, p-value = 0.1346
ADD COMMENT

Login before adding your answer.

Traffic: 2846 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6