How to calculate the sum of ranks per gene??
2
0
Entering edit mode
7.2 years ago
heso ▴ 40

Hi,

I've got a table with ranked miRNAs from different samples.

  sample1        sample2        sample3      sample4      sample5
1 mmu-mir-21a    mmu-mir-140    mmu-let-7i   mmu-let-7i   mmu-mir-218-2
2 mmu-mir-143    mmu-let-7i     mmu-mir-27b  mmu-let-7f-2 mmu-mir-143
3 mmu-let-7f-2   mmu-mir-143    mmu-let-7f-2 mmu-mir-140  mmu-let-7i
4 mmu-mir-206    mmu-mir-378    mmu-mir-22   mmu-let-7g   mmu-mir-218-1
5 mmu-mir-27b    mmu-mir-99b    mmu-mir-143  mmu-mir-22   mmu-mir-7a-1

...and would like to make a ranked summary file over the whole dataset to know which miRNAs are the most represented over all samples

I guess one can call it the sum of rank numbers per each miRNA: e.g. for mmu-let-7i it would be 0+2+1+1+3=7 ; for mmu-let-7f-2 it would be 3+0+3+2+0=8 etc.

Any ideas how to do that?

RNA-Seq • 2.0k views
ADD COMMENT
0
Entering edit mode

Interesting question, there is probably a one liner available with dplyr or something. But if I had to do this simple, I would first add a new column called rank.

df$rank <- 1:nrow(df)

then take the rank of each mir.

rank_sample1 <- df[order(df$sample1), "rank"]

and so on for each sample.

Finally add them all up.

It will work only of course if all samples have exactly the same mirs.

ADD REPLY
0
Entering edit mode
7.2 years ago
TriS ★ 4.7k

a lil more elaborated answer.

if you use the ordered gene list as rownames you can use the following (I used letters as example):

l <- LETTERS[1:6]
m <- matrix(replicate(5,sample(l)),byrow=F,ncol=5)

my_fun <- function(x){
  initial_order <- x[order(x)]
  ranks <- sapply(initial_order,function(y) which(x == y))
  return(ranks)
}
m_ranks <- rowSums(apply(m,2,function(k) my_fun(k)))

the lowest number will correspond to the highest ranking

ADD COMMENT
0
Entering edit mode
7.2 years ago
Tom_L ▴ 350

Simplest way (IMO): get the complete miRNA list, get the miRNA rank for each DF column, and sum ranks. R implementation (assuming your miRNA dataframe is "DF"):

miRNA=unique(unlist(DF))
sapply(miRNA,function(x) sum(as.numeric(apply(DF,2,function(y) which(y==x))),na.rm=T))

Cheers.

ADD COMMENT

Login before adding your answer.

Traffic: 1817 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6