Everything significant in Limma for Differential Expression
1
0
Entering edit mode
8.2 years ago
akknight • 0

Hi,

I'm trying to use limma for a differential expression analysis for a dichotomous trait. I'm unable to post the data here, but it's just a data frame for an expression microarray and a 0/1 designator for my trait. Every trait that I have run has had almost every probe reach significance after multiple test correction (which is not plausible). My code is posted below:

design<-pheno$trait
fit <- lmFit(edata1, design)
 fit <- eBayes(fit)
a<-topTable(fit, n=Inf, adjust="fdr")

Can anyone point out where I'm making a mistake?

microarray R limma expression • 3.6k views
ADD COMMENT
3
Entering edit mode

the design should be a matrix while looks like you are using a vector... maybe you want to try;

design <- model.matrix(~as.factor(pheno$trait))

with the little code you gave I'm not sure but that's a start.

btw, if you are adding info, pls edit the original post, do not write a new one below or in a comment unless you have a direct reply to this comment

ADD REPLY
0
Entering edit mode

Thanks for the suggestion. I tried that and it ended up making everything even more significant:

             logFC     AveExpr   t        P.Value      adj.P.Val    B
ILMN_1683271 12128.387 12129.640 142.6865 2.205768e-75 3.805832e-71 76.83640
ILMN_3225938  9810.439  9840.183 131.5552 2.385085e-73 1.385256e-69 76.60632
ILMN_1660498 10578.505 10524.694 131.2191 2.764078e-73 1.385256e-69 76.59849
ILMN_2133360 10237.700 10235.293 130.8780 3.211443e-73 1.385256e-69 76.59048
ILMN_3199798  9850.215  9843.610 127.0336 1.790600e-72 6.179001e-69 76.49586
ILMN_3280952  7043.654  7039.662 124.7893 5.001571e-72 1.438285e-68 76.43672
ADD REPLY
0
Entering edit mode

what platform it is ? Illumina HT12 v4 bead arrays?

ADD REPLY
0
Entering edit mode
8.2 years ago

This is not easy to diagnose without more information from you, but things that would help include:

  • where did you get your data from?
  • how did you load, process, and, normalize your data?

You don't have to provide the data, but you can show us the code you used to do the steps above.

The logFC's you are getting are insane, though, so something is clearly wrong. Is your data (at least) log2 transformed?

ADD COMMENT
0
Entering edit mode

Yes, it's the Illumina HT12 array.

I was given this dataset after QC, so I don't have the code for that either, but it was Quantile Normalized and ComBat adjusted.

I didn't log2 transform, I thought this done automatically in limma?

ADD REPLY
0
Entering edit mode

Take a look at the help page for ?lmFit, where the documentation for the first parameter (your edata1 object) reads:

A matrix-like data object containing log-ratios or log-expression values for a series of arrays, with rows corresponding to genes and columns to samples. Any type of data object that can be processed by getEAWP is acceptable.

(emphasis mine)

I don't know what type of data object your edata1 is, but ensure its data is log2 transformed before sending into lmFit. If it's just a matrix of numbers, then you can just log2(edata1) it.

It's unfortunate you don't have access to the raw data, but it would be good to find out how it was normalized from the person who gave it to you, as well as running basic QC to see how your data looks before (or after) you do your differential expression stats.

ADD REPLY
0
Entering edit mode

Thanks for all your help. I did the log transformation, and it was still significant so I finally realized I was calling coef for the intercept instead of the trait, and now things look reasonable.

ADD REPLY
1
Entering edit mode

I'm glad that you found your mistake, however I just want to mention that the code you provided in your original post could never have resulted in calling the "coef for the intercept". You said your call to topTable was like so:

a <- topTable(fit, n=Inf, adjust="fdr")

and if you don't explicitly pass in a value for the coef parameter, you will by default get the data for the last coef in your design (which isn't the intercept).

This is all to point out the importance of providing a minimally reproducible sample we can work from. As I originally stated, you could have done that without providing the data and we all would have gotten to the bottom of this much sooner.

ADD REPLY

Login before adding your answer.

Traffic: 1905 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6