How to prove one set of protein is more conserved than the other set?
0
0
Entering edit mode
5.8 years ago
465186528 • 0

Hi guys~ I have two sets of proteins: let's say A set (contains 100 A genes) and B Set containing (100 B genes). I want to show that genes in A Set are more conserved and less divergent than ones in B Set. I build phylogenetic trees, branch length for A Set is much longer than one from B Set. But it does not seem to be a good way to compare. I also tried network analysis using protein sequence identity, most genes from A set form a big network however genes from B set forms multiple network at the same cut value.

Could someone know a better way to compare using a more quantitive way?

sequence alignment • 1.1k views
ADD COMMENT
3
Entering edit mode

Generate all-versus-all pairwise global alignments in set A and calculate mean percent identity with standard deviation. Do the same for the sequences within set B. Then, depending on the distribution of identities apply t-test or Mann-Whitney test to see whether the difference between the two sets is statistically significant.

ADD REPLY
0
Entering edit mode

Thanks a lot. this is a feasiable way.

ADD REPLY
0
Entering edit mode

Are A all orthologues of one another and likewise for B?

If you got long branch lengths that either means A is the less conserved, or your alignment isn't very good.

You could try dN/dS analyses.

ADD REPLY
0
Entering edit mode

Thanks for your reply. A set and B set are from two different protein pfam family. Within the dataset, proteins are similar to each other. For branch length, I agree with you, longer branch does not directly implict the conservation. dN/dS or Ka/Ks is used to show the balance of selection, I guess it can not help compare divergence degree of two different sets of protein.

ADD REPLY
0
Entering edit mode

dN/dS would tell you if one group is subject to more drift than the other, which implies less conservation, but it isn't a direct measure I agree.

What I mean by the branch length is (assuming your alignments are OK), you already have your answer - that A is more divergent than B, but it sounds like you are looking for data to confirm a hypothesis you've already decided the answer to...

I don't know why you think that isn't a good comparator?

ADD REPLY
0
Entering edit mode

Thanks for your reply. I am sorry that I did not make it clear. I say that branch length of A Set is longer than B set. I mean the scale bar for each tree. Branching length is comparable if I could find a way to compare, do you have any idea about that?

Also for your question, 'you are looking for data to confirm a hypothesis you've already decided the answer to' Yes, I am trying to find something that I already have the answer. Because most proteins from A set share over 50% identity with each other, which can not be found in B set. Thus, I think A set is more conserved. Then I search for a approach to prove it and ask this question on Biostar.~~~

ADD REPLY
0
Entering edit mode

But if you know, through some means, that A are over 50% identical, and B are not, and the scale bar on your tree is larger (which mean your branch lengths also should be), then why not use the technique you’ve already apparently used which has already given you the answer?

To say it another way, how do you already know A is more conserved than B before you test it?

ADD REPLY
0
Entering edit mode

As you can see, over 50% identity and longer branch are preliminary things that I know. But I am looking for a quantitive way to nicely show the difference. For example, if I just see scale bar is different, it is not strong proof. Reviewers and even I would have questions, for example, if this difference pass the statistic test. I get one possible way to do it, as showed by @a.zielezinski

ADD REPLY

Login before adding your answer.

Traffic: 2671 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6