Bug in RUV spike-in normalization?
1
0
Entering edit mode
7.4 years ago
scchess ▴ 640

RUV is a normalization algorithm for ERCC spike-in. In the paper http://www.nature.com/nbt/journal/v32/n9/pdf/nbt.2931.pd, the Online Methods section. It was stated:

enter image description here

The algorithm performs SVD decomposition on the input count gene count matrix. The hidden matrix W is estimated by the left singular matrix (U) and the diagonal matrix, both from SVD.

But in the Bioconductor code implementation, https://github.com/drisso/RUVSeq/blob/master/R/RUVg-methods.R

svdWa <- svd(Ycenter[, cIdx])
first <- 1 + drop
k <- min(k, max(which(svdWa$d > tolerance)))
W <- svdWa$u[, (first:k), drop = FALSE]  <--- This line!!!! ONLY the U matrix is used. Bug?
alpha <- solve(t(W) %*% W) %*% t(W) %*% Y
correctedY <- Y - W %*% alpha

The R implementation ignores the diagonal matrix, and only use the left singular matrix (U)

The code and the paper don't match. Why?

RNA-Seq statistics biostatistics • 1.9k views
ADD COMMENT
2
Entering edit mode
6.9 years ago
debitboro ▴ 260

Hi student-t,

I stated the same thing when I tried to run a reduced example. I don't know why Risso used the U matrix only ??? The corresponding code to what it has been stated in the article would be like:

k <- min(k, max(which(svdWa$d > tolerance)))
W <- svdWa$u %*% svdWa$d

By doing that correction, I get a completely different result. I've to txt Risso about that.

Another thing, is about the calculation of alpha. We want to calculate it from the equation (1): enter image description here

But, what I see in the source code is

alpha <- solve(t(W) %*% W) %*% t(W) %*% Y
ADD COMMENT
0
Entering edit mode

Tell me what he responds.

ADD REPLY

Login before adding your answer.

Traffic: 2633 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6