Subset top 5% values of data frames stored in list
1
1
Entering edit mode
5.3 years ago
paolo002 ▴ 160

Hi This is probably a question for stack overflow but I am posting it here because there I am not getting much replies, apologies for this.

I have a 24 data frames with different number of rows and with various columns (for instance column SNPs ID and other columns with corresponding values for each SNPs) and I stored those data frames inside a list. I am doing various operations on the data frames at the same time. For instance if I want to order in decreasing manner a column in all the data frames inside the list I do:

myfiles_ordered<-lapply(myfiles, function(x) { x[ order(x$column_name_to_order, decreasing=T),]})

Now, after ordering that column I would like to take the top 5 % of the values of it. I was thinking I can subset all the data frames based on their specific row number multiplied by 0.05 and I wrote something like this:

myfiles_top5<-lapply(myfiles_ordered, function(x) {x[1:nrow(x)*5/100,]})

However, it does not seems to work. Any help highly appreciated, thanks.

R subset • 1.6k views
ADD COMMENT
0
Entering edit mode

there I am not getting much replies

Could you add the link to StackOverflow post?

ADD REPLY
0
Entering edit mode

https://stackoverflow.com/questions/53724885/subset-multiple-data-frames-stored-inside-a-list

in any case...this was the link but maybe there I did not explain my problem so well...

ADD REPLY
3
Entering edit mode
5.3 years ago
zx8754 11k

Try:

myfiles_top5 <- lapply(myfiles_ordered, function(x) { x[ 1:round(nrow(x)*5/100), ]})

Because we are creating a sequence of 1 to n, then applying 5% for all of them.

1:nrow(mtcars) * 5/100
#  [1] 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80
# [17] 0.85 0.90 0.95 1.00 1.05 1.10 1.15 1.20 1.25 1.30 1.35 1.40 1.45 1.50 1.55 1.60

Not what we need...

Instead we need to get 5% then get the sequence, using parenthesis ():

1:(nrow(mtcars) * 5/100)
# [1] 1

Again not ideal as, below both give 1:

1:1.2
# [1] 1
1:1.6
# [1] 1

Whereas we might need 1:2 for 1:1.6, so we use round:

1:round(1.2)
# [1] 1
1:round(1.6)
# [1] 1 2

Update: We can do ordering and subsetting 5% in one go, e.g.:

# Using base
head(mtcars[ order(mtcars$mpg, decreasing = TRUE), ], round(nrow(mtcars) * 5/100))
#                 mpg cyl disp hp drat    wt  qsec vs am gear carb
# Toyota Corolla 33.9   4 71.1 65 4.22 1.835 19.90  1  1    4    1
# Fiat 128       32.4   4 78.7 66 4.08 2.200 19.47  1  1    4    1

# Using dplyr
library(dplyr)
top_n(mtcars, round(nrow(mtcars) * 5/100), wt = mpg)
#    mpg cyl disp hp drat    wt  qsec vs am gear carb
# 1 32.4   4 78.7 66 4.08 2.200 19.47  1  1    4    1
# 2 33.9   4 71.1 65 4.22 1.835 19.90  1  1    4    1
ADD COMMENT

Login before adding your answer.

Traffic: 1959 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6