Question

How to calculate the sum of ranks per gene??

0

Entering edit mode

7.2 years ago

heso ▴ 40

Hi,

I've got a table with ranked miRNAs from different samples.

  sample1        sample2        sample3      sample4      sample5
1 mmu-mir-21a    mmu-mir-140    mmu-let-7i   mmu-let-7i   mmu-mir-218-2
2 mmu-mir-143    mmu-let-7i     mmu-mir-27b  mmu-let-7f-2 mmu-mir-143
3 mmu-let-7f-2   mmu-mir-143    mmu-let-7f-2 mmu-mir-140  mmu-let-7i
4 mmu-mir-206    mmu-mir-378    mmu-mir-22   mmu-let-7g   mmu-mir-218-1
5 mmu-mir-27b    mmu-mir-99b    mmu-mir-143  mmu-mir-22   mmu-mir-7a-1

...and would like to make a ranked summary file over the whole dataset to know which miRNAs are the most represented over all samples

I guess one can call it the sum of rank numbers per each miRNA: e.g. for mmu-let-7i it would be 0+2+1+1+3=7 ; for mmu-let-7f-2 it would be 3+0+3+2+0=8 etc.

Any ideas how to do that?

RNA-Seq • 2.0k views

ADD COMMENT • link updated 7.2 years ago by Tom_L ▴ 350 • written 7.2 years ago by heso ▴ 40

0

Entering edit mode

Interesting question, there is probably a one liner available with dplyr or something. But if I had to do this simple, I would first add a new column called rank.

df$rank <- 1:nrow(df)

then take the rank of each mir.

rank_sample1 <- df[order(df$sample1), "rank"]

and so on for each sample.

Finally add them all up.

It will work only of course if all samples have exactly the same mirs.

ADD REPLY • link 7.2 years ago by Benn 8.3k

score 0 · Answer 1 · 2017-03-02

a lil more elaborated answer.

if you use the ordered gene list as rownames you can use the following (I used letters as example):

l <- LETTERS[1:6]
m <- matrix(replicate(5,sample(l)),byrow=F,ncol=5)

my_fun <- function(x){
  initial_order <- x[order(x)]
  ranks <- sapply(initial_order,function(y) which(x == y))
  return(ranks)
}
m_ranks <- rowSums(apply(m,2,function(k) my_fun(k)))

the lowest number will correspond to the highest ranking

score 0 · Answer 2 · 2017-03-02

0

Entering edit mode

7.2 years ago

Tom_L ▴ 350

Simplest way (IMO): get the complete miRNA list, get the miRNA rank for each DF column, and sum ranks. R implementation (assuming your miRNA dataframe is "DF"):

miRNA=unique(unlist(DF))
sapply(miRNA,function(x) sum(as.numeric(apply(DF,2,function(y) which(y==x))),na.rm=T))

Cheers.

ADD COMMENT • link 7.2 years ago by Tom_L ▴ 350