Question

Differences between parSeqSim and twoSeqSim results

0

Entering edit mode

5.7 years ago

lefthandgergo ▴ 10

Hi!

I am trying to compare 100 peptide sequences to each other using the default settings of twoSeqSim and parSeqSim from the protr package (local alignment and BLOSUM62 substitution matrix). However, the results are different using the two functions. Using my CompareAll function, which executes twoSeqSim multiple times to compare all peptides in a vector, I've got integer scores in the similarity matrix. However, when I run parSeqSim on the same peptide set, it seems that it somehow normalizes the result values, since the results are between 0 and 1. How does this normalization work? Thanks!

# twoSeqSim    
CompareAll <- function(eps) { # does pairwise comparisions for every peptides in the vector
      simmtx <- matrix(nrow = length(pep),
                       ncol = length(pep),
                       dimnames = list(pep, pep))
      for (i in 1:length(pep)) {
        for (j in i:length(pep)) {
          simmtx[i, j] <- twoSeqSim(pep[i], pep[j])@score
        }
      }
      return(simmtx)
    }

# parSeqSim
parSeqSim(peptides_tmp)

R protr sequence similarity • 1.1k views

ADD COMMENT • link updated 5.7 years ago by h.mon 35k • written 5.7 years ago by lefthandgergo ▴ 10

score 2 · Accepted Answer · 2018-08-21

The normalization performed by parSeqSim() is:

if ( is.numeric(s12) == FALSE |
     is.numeric(s11) == FALSE |
     is.numeric(s22) == FALSE ) {
  sim = 0L
} else if ( abs(s11) < .Machine$double.eps |
            abs(s22) < .Machine$double.eps ) {
  sim = 0L
} else {
  sim = s12/sqrt(s11 * s22)

}

Where s11 is the score of sequence1 aligned to itself, s22 is the score of sequence2 aligned to itself, and s12 is the score of sequence1 aligned to sequence2.

This means if any score is non-numeric, or if either s11 or s22 are really, really small, then sequence similarity is set to zero; otherwise, sequence similarity is given by s12 / sqrt (s11 * s22 ).