Biostar Beta. Not for public use.
How to filter low expressed genes based on TPM expression?
Entering edit mode
13 months ago
newbie • 40

I have a dataset with 50k genes as rows in a dataframe and there are 500 samples as columns with TPM expression values. I want to classify these tumor samples samples into two groups i.e. Gene_High and Gene_low based on TPM expression values.

Before that I want to filter out low expressed genes, which are of no use. There are some genes with showing TPM value for only 50 samples and the rest 450 samples are 0.

So, if I have a dataframe df with 50k genes as rows and 500 samples as columns how to fillter out low expressed genes? How to give the command to filter out low expressed genes in R?

Entering edit mode
12 months ago
Prakash ♦ 1.2k

somebody wrote this code on biostars but i don't remember the post. see if this could help.

count <- read.csv("count.txt",sep = "\t",header = T,row.names=1)
#Remove rows if count is < zero in 50% of sample
rem <- function(x){
  x <- as.matrix(x)
  x <- t(apply(x,1,as.numeric))
  r <- as.numeric(apply(x,1,function(i) sum(i == 0) ))
  remove <- which(r > dim(x)[2]*0.5)
remove <- rem(count)
countdata <- count[-remove,]

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1