series_matrix.txt in GEO are they always the normalized values?
1
1
Entering edit mode
5.8 years ago
Leite ★ 1.3k

Hello everyone,

I have a simple question, the file series_matrix.txt in GEO are always the normalized data of the study?

Best,

Leite

geo series_matrix.txt • 13k views
ADD COMMENT
0
Entering edit mode

https://ibb.co/B2RV1CD picture is this comparbale? thanks a lot code copy from you

library(Biobase)
library(GEOquery)

gset <- getGEO("GSE28739", GSEMatrix =TRUE, getGPL=FALSE)    
if (length(gset) > 1) idx <- grep("GPL6480", attr(gset, "names")) else idx <- 1
gset <- gset[[idx]]

dev.new(width=4+dim(gset)[[2]]/5, height=6)
par(mar=c(2+round(max(nchar(sampleNames(gset)))/2),4,2,1))
title <- paste ("GSE29745", '/', annotation(gset), " selected samples", sep ='')
boxplot(exprs(gset), boxwex=0.7, notch=T, main=title, outline=FALSE, las=2)
ADD REPLY
0
Entering edit mode

Some samples appear to be outliers. Please check

ADD REPLY
0
Entering edit mode

thanks a lot, how did you come to this conclusion,just because the median of 712058 and 712060 is much higher?

ADD REPLY
9
Entering edit mode
5.8 years ago

Hey Leite,

The answer is that, yes, the series matrix files should contain normalised, log2 values. However, the GEO provide situations in which these files may not contain normalised data:

GEO2R operates on Series Matrix files which contain data extracted directly from the VALUE column of Sample tables. Submitters are asked to supply normalized data in the VALUE column, rendering the Samples cross-comparable. The majority of GEO data do conform to this rule. GEO applies no further processing other than to perform a log2 transformation on values determined not to be in log space (see Options section). However, some studies, such as dual channel loop design data, may generate values that do not have a common reference and are not directly comparable. Some studies may contain Sample value data that are not normalized, or have a design such that the Samples were never intended to be directly compared. Yet other studies do not have sufficient replicate Samples to perform a robust statistical analysis. Users should examine the original Series to understand the experimental design, and check the 'Data processing' field or VALUE description in the original Sample records for information on what the values represent. The box plot feature on the Value distribution tab is provided to help users assess whether the distributions of values across Samples are median-centered, which is generally indicative that the data are normalized and cross-comparable.

[source: https://www.ncbi.nlm.nih.gov/geo/info/geo2r.html]

When you obtain data, you should always check the distribution with box- and scatter plots, and histograms, in order to gauge whether thy are normalsed or not.

Kevin

ADD COMMENT
1
Entering edit mode

Thank you so much my friend!

ADD REPLY

Login before adding your answer.

Traffic: 2370 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6