What is the accurate order of preprocessing steps in DNA Microarray gene expression analysis?
1
2
Entering edit mode
4.9 years ago
J. Smith ▴ 80

Hi friends,

I have downloaded DNA Microarray data from NCBI. Data contains both control samples and affected samples for all genes. I want to perform downstream analysis like clustering, classification. I know that some preprocessing steps like normalization, log2 transformation and differential expressed genes selection are necessary before performing clustering or classification.

But I am unsure about the exact order of such preprocessing steps although I know that normalization is performed before log2 transformation. Please let me know the following things:

1) Whether preprocessing steps normalization and then log2 transformation needs to be done before differential expressed genes selection and differential expressed genes selection needs to be done using modified normalized and log2-ed data?

2) In case of RNASeq data, I learned that differential expression analysis is done using un-normalized and un-logged count data as the statistical model is most powerful when applied to un-normalized counts. Then whether we can also select differentially expressed genes from microarray data without performing normalization and log2 transformation? Please note that I will use SAM or Limma for selecting differentially expressed genes from microarray data.

3) Are there any other preprocessing or quality control steps necessary before clustering? If so please mention their exact order.

Thanks in advance.

microarray limma SAM preprocessing • 11k views
ADD COMMENT
1
Entering edit mode

See this end-to-end workflow.

ADD REPLY
0
Entering edit mode

Thanks a lot. Now I have understood, there are a lot more preprocessing steps which I have to carry out before we can apply limma for differentially expressed genes. And limma can be applied with the final preprocessed data only. But can you please tell me why it is different from RNASeq? I mean why limma should be applied using final preprocessed data in case of microarray whereas in case of RNASeq, DESeq2 should be applied with raw count data without normalization and log2?

ADD REPLY
1
Entering edit mode

some packages like gcrma take care of normalization and log transformation. You can refer here- http://www.bioconductor.org/packages/release/bioc/html/gcrma.html

Then you can use these value to perform DE analysis using limma

ADD REPLY
0
Entering edit mode

Thanks a lot for your response and referred package.

ADD REPLY
2
Entering edit mode
4.9 years ago

Your Qs are very general and broad - It would be better if you start with some tutorial of micro-array analysis and then formulate specific Qs.

https://bioconductor.org/packages/devel/workflows/vignettes/maEndToEnd/inst/doc/MA-Workflow.html#1_introduction

To briefly answer your Qs:

  1. Normalization is necessary to compare across samples. This is essential before you do any downstream analysis (Clustering, DE etc.). Expression data vary widely and are skewed. Log2 is an useful transform to make the data behave more "normal" and also to reduce variability:

    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4120293/

    https://genomicsclass.github.io/book/pages/robust_summaries.html

    Additional advantage is that you can interpret the Fold Changes in terms of multiple of 2. For clustering, pca (or any other dimensional reduction analysis), it is imperative that you use normalized values, which are often returned as log2-values.

  2. RNAseq data are count data and they are discrete compared to microarray data, which are intensity values and are continuous. They follow different statistical models, and so their DE analysis is a bit different. But in either case, normalization is fundamental as that allows the samples to compare among themselves. See this tutorial for the basic idea of DE using limma

    https://www.bioconductor.org/help/course-materials/2005/BioC2005/labs/lab01/estrogen/

  3. Normalization is essential. Also you may select only highly variable genes for clustering/pca.

ADD COMMENT
0
Entering edit mode

Thanks a lot for your extensive response with reference papers.

ADD REPLY
1
Entering edit mode

Thank you. I updated the post with another relevant reference and correct formatting.

ADD REPLY

Login before adding your answer.

Traffic: 2455 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6