Question

How to select values based on specific condition from the matrix in R

0

Entering edit mode

8.8 years ago

MAPK ★ 2.1k

Hi Guys,

I have a large matrix as shown below mymatrix. I would like to know if there is any way I can get the result in the form of list or matrix for each position with only those nucleotides that have values( i.e ones without NA's) and in decreasing order. For example, I want to get the result in these format:

In the form of matrix:

pos 161111     T(17)  C(1)
pos 99022222        G(24)      A(3)

or in the form of list

pos 161111
T                    C
17                   1

pos 99022222
G                    A
24                   3

and so forth...Thank you.

mymatrix

pos        A   C   G   T   N
1611111    NA  1   NA  17  NA
99022222   3   NA  24  NA  NA
99092333   NA  5   NA  91  NA
233232333  2   22  NA  NA  NA

R • 2.3k views

ADD COMMENT • link updated 16 months ago by Ram 43k • written 8.8 years ago by MAPK ★ 2.1k

0

Entering edit mode

How large of a matrix are we talking here? An efficient solution might be needed if it is too large. Otherwise, this problem is relatively easy and I will answer it when you reply.

ADD REPLY • link updated 16 months ago by Ram 43k • written 8.8 years ago by Steven Lakin ★ 1.8k

0

Entering edit mode

It's a fairly large matrix. Thank you!

ADD REPLY • link updated 16 months ago by Ram 43k • written 8.8 years ago by MAPK ★ 2.1k

0

Entering edit mode

What dimensions? dim(mymatrix)

ADD REPLY • link updated 16 months ago by Ram 43k • written 8.8 years ago by Steven Lakin ★ 1.8k

0

Entering edit mode

Right now my matrix is of 6023 by 8.

ADD REPLY • link 8.8 years ago by MAPK ★ 2.1k

0

Entering edit mode

OK, that's not too bad. I will write a quick answer for it in a sec.

ADD REPLY • link updated 16 months ago by Ram 43k • written 8.8 years ago by Steven Lakin ★ 1.8k

0

Entering edit mode

Thank you, I would really appreciate that!

ADD REPLY • link 8.8 years ago by MAPK ★ 2.1k

Ram · Accepted Answer · 2015-06-17

This is not by any means pretty code, but it should work for you and output it in tab delimited format in a file in your output directory. You can then re-read that into R using read.table() with sep set to \t. I didn't include the decreasing order since it is late here, but with a little imagination, you could probably add it.

transformMyMatrix <- function(mymatrix, outputFile) {
        for(i in 1:nrow(mymatrix)) {
                temp <- paste(c("pos(", mymatrix[i, "pos"], ")"), collapse='')
                for(j in 2:ncol(mymatrix)) {
                        if(!is.na(mymatrix[i,j])) {
                                temp <- c(temp, paste(c(names(mymatrix)[j], "(", mymatrix[i,j], ")"), collapse=''))
                        }
                }
                write.table(t(as.matrix(temp)), file=outputFile, sep="\t", append=T, quote=F, row.names=F, col.names=F)
        }
}

Then call the function:

transformMyMatrix(mymatrix, "outputFile.txt")

For example, here is what I get with that:

mymatrix
        pos  A  C  G  T  N
1   1611111 NA  1 NA 17 NA
2  99022222  3 NA 24 NA NA
3  99092333 NA  5 NA 91 NA
4 233232333  2 22 NA NA NA

transformMyMatrix(mymatrix, outputFile="newMatrix.txt")
newMatrix <- read.table(file="newMatrix.txt", sep="\t")
newMatrix
              V1   V2    V3
1   pos(1611111) C(1) T(17)
2  pos(99022222) A(3) G(24)
3  pos(99092333) C(5) T(91)
4 pos(233232333) A(2) C(22)

Be aware that if you try to read it back into R in a data frame format, it will automatically fill in empty spots with NAs if the # of columns are uneven, so you might have to address that.