Classify Genes As Expressed Or Not Expressed
1
1
Entering edit mode
10.1 years ago
predeus ★ 1.9k

Hello all,

this is probably a very obvious question, but I've never dealt with this sort of a problem, so I hope you all can point me in the right direction.

Imagine we have an array or annotated and quantified RNA-seq experiment. There are about ~24k genes, with normalized numerical expression value (or FPKM) assigned to them.

What is the most statistically sound way to automatically classify genes as "expressed" and "not expressed"? People often use empirical cutoff for this, e.g. FPKM of 1, but that's not what I'm interested in.

Thank you for any inputs.

gene-expression statistics classification • 2.3k views
ADD COMMENT
1
Entering edit mode
10.1 years ago
xb ▴ 420

One simple approach is to standardize the log2 transformed expression values (within a sample for instance)

>0 for overexpressed; <0 for underexpressed; or use a cutoff other than zero where appropriate.

This is different from what you asked for - "expressed" and "not expressed".

However, the relative expression levels are more practical in my cases, and is easy to apply the downstream statistics, such as SAM ( http://cran.r-project.org/web/packages/samr/index.html ). It is applicable to both array or NGS data.

ADD COMMENT

Login before adding your answer.

Traffic: 2570 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6