RNA-seq read count data normalizaiton using housekeeping genes
1
6
Entering edit mode
9.6 years ago

I have a set of RNA-seq data, and I used cuffdiff, deseq2 and edgeR to do the normalization. However, the expression level of the housekeeping genes seems unstable during different time points, i.e. have great fold-change. I am wondering if there is any solutions to normalize the expression matrix using a set of housekeeping gene row count as reference since they should express stably. Should I do the normalization work manually or there is some package I can use, or adjust the packages i mentioned before to make it work?

normalization RNA-Seq • 9.6k views
ADD COMMENT
5
Entering edit mode
9.6 years ago

The question becomes how stably expressed the house keeping genes actually are (they tend to be less stable than advertised). The simplest way to do what you mentioned is to subset the DESeqDataSet or DGElist by the house keeping genes, normalize that, and apply the resulting normalization factors back to the full dataset.

ADD COMMENT
0
Entering edit mode

Thanks a lot. Actually, as you say, the housekeeping genes' expression level may not be that stable. And the way you mentioned sound good. Because I don't use DEseq2 proficiently ,could you please give me more detailed process about "subset the DESeqDataSet or DGElist by the house keeping genes, normalize that, and apply the resulting normalization factors back to the full dataset". Thank you again!

ADD REPLY
1
Entering edit mode

You just need to make a list of IDs associated with your house keeping genes. I think DGElists and DESeqDataSet objects have row.names() accessors, so just subset things with %in%.

ADD REPLY
0
Entering edit mode

Do you mean that I should leave the dataset except housekeeping gene behind, and just normalize the housekeeping genes, then I will get a normalization factor, finally use the factor to normalize the full dataset?

ADD REPLY
0
Entering edit mode

yes, exactly

ADD REPLY
0
Entering edit mode

Ok, will DEseq2 give me the normalization factor or calculate it manually?

ADD REPLY
0
Entering edit mode

Just use the estimateSizeFactors() function.

ADD REPLY
1
Entering edit mode

Sorry to bother you again, the estimateSizeFactors() has the controlGenes argument,and the Reference Manual lists the examples, such as dds <- estimateSizeFactors(dds, controlGenes=1:200), ​do you know what 1:200 mean?

ADD REPLY
0
Entering edit mode

That's new, but quite convenient, since it saves the whole subsetting process. 1:200 means that the genes to use for normalization are the first 200 in the dataset. It's just a vector of indices.

ADD REPLY
0
Entering edit mode

But when I do what the Manual's example does:

> dds <- makeExampleDESeqDataSet(n=1000, m=12)
> dds <- estimateSizeFactors(dds, controlGenes=1:200)
Error in .local(object, ...) : unused argument (controlGenes = 1:200)

do you know what's wrong with that

ADD REPLY
0
Entering edit mode

The the version specified in the manual and the one you have locally aren't the same.

ADD REPLY

Login before adding your answer.

Traffic: 2483 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6