Biostar Beta. Not for public use.
Question: measure of residue dissimilarity in aligment (Phylogenetic discriminant)
0
Entering edit mode

Hi, Is there any probabilistic measure of position dissimilarity from rest of alignment? ie. method to assess which residues are responsible for the phylogenetic and functional differences.

I have alignment of ATP synthases, and I noticed that Mycobacterium is highly divergent from other Bacteria or even other Actinobacteria. I need some measure to statistically discriminate which residues are responsible for this divergence.

Any thoughts?

Thanks in advance

Entering edit mode
1

I'm not sure if I exactly understand what you're after, but here goes:

You might be interested in calculating the Shannon Entropy per column of your Sequence alignment. High entropy positions will be your more divergent ones see for example: https://gist.github.com/jrjhealey/130d4efc6260dd76821edc8a41d45b6a.

You may need to take this further and do a dN/dS analysis or similar, since it probably won't be enough to just determine sites that are variable. You will need to demonstrate that they are causing meaningful selection (i.e. non synonymous).

ADD REPLYlink 16 months ago
Joe
12k
Entering edit mode
0

Firstly: Thank you for answer

I was thinking to use dn/ds as a next step, but as far as I know it allows only for pairwise comparisons. I was looking for something more like measuring inside clade vs. outside clade (site specific) variation. Column-wide, like:

for each column:
    MANOVA where 
        independent var= inside / outside 
        dependent var  = freqs of aminoacids in a column.

Or am I thinking bullshit? (I started my bioinformatical adventure recently → I'm still green as a lime and trying to learn) Nevertheless I will go with dn/ds as it will answer my question.

ADD REPLYlink 16 months ago
michau
• 20
Entering edit mode
1

I don’t know enough about the stats to speak to whether a manova approach would work.

The only limitation that strikes me with that approach, however, is that objectively clustering ‘clades’ is a very difficult problem. It’s often much more obvious to a person than to a computer.

I would think there are approaches which will work in a non pairwise fashion. A quick bit of googling bought this up, which sounds like it might fit the bill?

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2887424/

ADD REPLYlink 16 months ago
Joe
12k

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0