What is the best way to analyze non-normalized Illumina HumanHT-12 microarray?
0
1
Entering edit mode
5.9 years ago
Leite ★ 1.3k

Hello everyone,

I tried to help on a post one month ago, but without success.

I continue to have problems analyzing the HumanHT-12 chip, I'm trying to follow this code by @Gordon Smyth.

My biggest problem is to split the groups, how to tell the R what Healthy controls and the patients.

> library(limma)
> x <- read.ilmn("GSE74629_non-normalized.txt",expr="SAMPLE ",probeid="ID_REF")
Reading file GSE74629_non-normalized.txt ... ...
> y <- neqc(x)
Note: inferring mean and variance of negative control probe intensities from the
detection p-values.
> Group <- rep(c("PDAC","Healthy"),c(36,14))
> Group <- factor(Group)
> design <- model.matrix(~Group)
> keep <- rowSums(y$E>5) >= 14
> y2 <- y[keep,]
> fit <- lmFit(y2,design)
> fit <- eBayes(fit,trend=TRUE,robust=TRUE)
> topTable(fit,coef=2)

Can I use something like targets <- readTargets ("targets.txt") ? and then say what are the controls and patients:

#Build the design matrix for the linear modelling function
f <- factor(targets$Target, levels = unique(targets$Target))
design <- model.matrix(~0 + f)
colnames(design) <- levels(f)

#Create a contrast matrix
contrast.matrix <- makeContrasts("patients-control", levels=design)

Thank you all,

Leite

r HumanHT-12 Illumina microarray • 2.8k views
ADD COMMENT
1
Entering edit mode

Can I use something like targets <- readTargets ("targets.txt") ? and then say what are the controls and patients

Yes you can. Take note that topTable() is deprecated in favour of topTreat() - the post you are following is quite old, I would suggest you follow the limma Users Guide instead.

ADD REPLY
0
Entering edit mode

Hey h.mon, ty so much. I'll try to do this. the difference topTreat() and topTable() is that in topTreat I can use more parameters to filter, or have more things?

ADD REPLY
1
Entering edit mode

The difference is topTreat() better controls false discovery rate, read the Note section on ?topTreat.

ADD REPLY
0
Entering edit mode

Thank you so much again.

ADD REPLY
1
Entering edit mode

Just to add: The example that Gordon posted (on Bioconductor) produces the same type of model matrix that we normally create via information supplied via readTargets. One can create the model matrix in any way, though. You do not necessarily have to use readTargets.

For limma, for any array source, you just need to get your sample files into a data matrix, with samples as columns and genes as rows. Then, you just need to specify a model matrix (via any means) that defines your sample groupings.

Be aware, also, that Gordon may have only be supplying sample code and that the line Group <- rep(c("PDAC","Healthy"),c(36,14)), i.e., 36 PAC versus 14 controls, may not reflect the actual numbers of these groups in the dataset

ADD REPLY
0
Entering edit mode

Hey Kevin,

Thank you so much for your reply. I decided to use the normalized data, it seemed to be the best and easiest way to analyze this data. I made a breakthrough this weekend and my R code is like this:

#Configure o diretório de trabalho
setwd("C:/Users/")

#Read the targets
library(limma)
library(affy)

eset <- readExpressionSet("GEOdata_normalised.txt", header=TRUE)
read.delim("Targets.txt", header=TRUE)

#Build the design matrix for the linear modelling function
f <- factor(targets$Target, levels = unique(targets$Target))
design <- model.matrix(~0 + f)
colnames(design) <- levels(f)

#Apply the intensity values to lmFit
fit <- lmFit(eset, design)
write.table(fit, file="fit.txt", sep="\t", quote=FALSE)

#Create a contrast matrix
contrast.matrix <- makeContrasts("patients-control", levels=design)

#Apply this contrast matrix to the modeled data and compute statistics for the data
fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2)

#Output the statistics for the dataset and write them to disk
output <- topTreat(fit2, coef=1, number=Inf, adjust.method="BH", lfc=0.42)
write.table(output, file="alive-Control.txt", sep="\t", quote=FALSE)

I'm studying to see if there's any mistake in it, what do you guys think?

Best,

Leite

ADD REPLY
1
Entering edit mode

Boa tade Leite, and how do the p-values appear? A good measure is to take the most differentially expressed genes and see if they separate your groups in clstering + heatmap.

ADD REPLY
0
Entering edit mode

Boa tarde Kevin,

I can find many genes with an adjusted p <0.05. I'll try what you mentioned. What does not make me very comfortable with this code is this function eset <- readExpressionSet I'm still studying more about it.

ADD REPLY
1
Entering edit mode

That function appears to just 'coerce' (convert) a data matrix into an Expression Set object, which is require for Limma.

The only other thing that you need to ensure is that your column names (in eset) are in the same order as your samples in targets.

ADD REPLY
1
Entering edit mode

Oi Kevin,

This point I had not thought of. "The only other thing that you need to ensure is that your column names (in eset) are in the same order as your samples in targets."

Thank you very much for this super tip, it made all the difference now.

ADD REPLY

Login before adding your answer.

Traffic: 1446 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6