how calculate different amino acids in a aligning format?
0
0
Entering edit mode
6 months ago
star ▴ 350

I have a protein alignment data table like the one below. I would like to know how to calculate the number of differences for each amino acid position for query 1 vs other queries.

for example: the protein sequence starts with "ME" and finishes with "HL". in position 5 there is a difference between this query which is "M" compared to query 1, which is "V" . Then I would expect a data frame like :

df <- data.frame(difference=c(0,0,0,0,1,........))

Input:

          query                                                                               amino_acids
1   lcl|Query_10001                              MEKIVLLFAIVSLVKSDQICIGYHANNSTEQVDTIMEKNVTVTHAQDILEKKHNGKLCDL
2   lcl|Query_10002                              MEKIVLLLSVVSLVKSDQICIGYHANNSTEQVDTIMEKNVTVTHAQDILEKTHNGKLCDL
3   lcl|Query_10003                              MEKIMLLLAATGLVKSDHICIGYHANNSTKQVDTIMEKNVTVTHAQDILEKTHNGKLCDL
4                                               
5   lcl|Query_10001                              DGVKPLILRDCSVAGWLLGNPMCDEFINVPEWSYIVEKANPVNDLCYPGDFNDYEELKHL
6   lcl|Query_10002                              NGVKPLILKDCSVAGWLLGNPMCDEFISVPEWSYIVERANPANDLCYPGNLNDYEELKHL
7   lcl|Query_10003                              NGVKPLILKDCSVAGWLLGNPMCDEFINVPEWSYIVEKANPANGLCYPGSFNDYEELKHL

Thank you in advance for any help!

R • 340 views
ADD COMMENT
0
Entering edit mode

You're looking for residue level conservation scores, what you've posted here is an XY problem. Unless you need to use R to do this, there are better tools out there.

ADD REPLY

Login before adding your answer.

Traffic: 1379 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6