Question

Classify Genes As Expressed Or Not Expressed

1

Entering edit mode

10.1 years ago

predeus ★ 1.9k

Hello all,

this is probably a very obvious question, but I've never dealt with this sort of a problem, so I hope you all can point me in the right direction.

Imagine we have an array or annotated and quantified RNA-seq experiment. There are about ~24k genes, with normalized numerical expression value (or FPKM) assigned to them.

What is the most statistically sound way to automatically classify genes as "expressed" and "not expressed"? People often use empirical cutoff for this, e.g. FPKM of 1, but that's not what I'm interested in.

Thank you for any inputs.

gene-expression statistics classification • 2.3k views

ADD COMMENT • link updated 10.1 years ago by xb ▴ 420 • written 10.1 years ago by predeus ★ 1.9k

score 1 · Answer 1 · 2014-03-28

One simple approach is to standardize the log2 transformed expression values (within a sample for instance)

>0 for overexpressed; <0 for underexpressed; or use a cutoff other than zero where appropriate.

This is different from what you asked for - "expressed" and "not expressed".

However, the relative expression levels are more practical in my cases, and is easy to apply the downstream statistics, such as SAM ( http://cran.r-project.org/web/packages/samr/index.html ). It is applicable to both array or NGS data.