Question

How to do Clariom D Human analysis for 3 groups in R?

1

Entering edit mode

4.9 years ago

jiamacro ▴ 20

Hi! I am a freshman who wanna analyze my clariom d human data. I have 3 groups and total 14 .CEL files and I wanna find the DEG, do GO and KEGG enrichment and do further Gene Expression Trend Analysis to find genes or pathway in my 3 groups with specific trend. My code is here:

>setwd ("/Users/mac/Desktop/Physician_Scientist/HypoT/Microarray_Analysis/3_Clariom_D_Human/1_CEL_Files")
>library (oligo)
>library (pd.clariom.d.human)
>library(affycoretools)
>rawDatal <- read.celfiles (list.celfiles ("/Users/mac/Desktop/Physician_Scientist/HypoT/Microarray_Analysis/3_Clariom_D_Human/1_CEL_Files", full.name = T))
>pData (rawData)
>normData <- oligo::rma (rawData, target = "core")
>normData
>boxplot (normData)
>all.eset <- annotateEset(normData, annotation(normData))
>all.eset

And I am confuse about what should I do next? Waiting for some help. Thank u!

R • 3.8k views

ADD COMMENT • link updated 4.9 years ago by Kevin Blighe 87k • written 4.9 years ago by jiamacro ▴ 20

1

Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. You can use backticks for inline code (`text` becomes text), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.
code_formatting

Also, clariom d, differential expression, GO, KEGG - these should have been tags in your post. By using just R as a tag, the question makes it difficult for an expert in, say, the clariom d technology to find your post.

ADD REPLY • link 4.9 years ago by Ram 43k

score 2 · Answer 1 · 2019-06-07

2

Entering edit mode

4.9 years ago

Kevin Blighe 87k

You need to next use limma in order to perform differential expression analysis.

To do that, you first need:

design formula
contrasts matrix

Please search online in order to find out how to do this. When searching, you no longer have to consider the fact that you have a Clariom array, as you have already normalised the array and have the data in a standardised format. Limma can be applied to any type of array, once normalised.

Kevin

ADD COMMENT • link 4.9 years ago by Kevin Blighe 87k

0

Entering edit mode

Thank u for your selfless help! I did the design and contrasts matrix and made the analysis, while there are still something confuse me.

design <- model.matrix (~0 + group, pData(Trans.eset))
cm <- makeContrasts (CADvControl = groupCAD - groupControl, 
                     CAD_SCHvControl = groupCAD_SCH - groupControl,
                     CAD_SCHvCAD = groupCAD_SCH - groupCAD,
                     levels = design)
fit <- limma::lmFit (Trans.eset, design = design)
fit2 <- contrasts.fit (fit, contrasts = cm)
fit2 <- limma::eBayes (fit2)
results <- decideTests (fit2)
summary (results)



  CADvControl CAD_SCHvControl CAD_SCHvCAD
Down             0               0           0
NotSig      138745          138745      138745
Up               0               0           0

It shows nothing is up-regulated or down-regulated with no B value in topTable(fit2). But when I did two-two comparison, there shows results of t-test, P value and B value.

 tempOutput1 <- topTable (fit2, coef = "CADvControl", nrow(fit2))
    tempOutput2 <- topTable (fit2, coef = "CAD_SCHvControl", nrow(fit2))
    tempOutput3 <- topTable (fit2, coef = "CAD_SCHvCAD", nrow(fit2))
    DEG_CAD <- na.omit (tempOutput1)
    DEG_CAD_SCH <- na.omit (tempOutput2)
    DEG_Between <- na.omit (tempOutput3)
    head (DEG_CAD)
                            PROBEID  ENTREZID       SYMBOL                                          GENENAME
TC0100018466.hg.1 TC0100018466.hg.1    163259      DENND2C                         DENN domain containing 2C
TC1700010560.hg.1 TC1700010560.hg.1     57125       PLXDC1                        plexin domain containing 1
TC0X00009878.hg.1 TC0X00009878.hg.1    142689        ASB12         ankyrin repeat and SOCS box containing 12
TC0800009830.hg.1 TC0800009830.hg.1     80346        REEP4                      receptor accessory protein 4
TC1300006529.hg.1 TC1300006529.hg.1    650794      MIPEPP3 mitochondrial intermediate peptidase pseudogene 3
TC0800006669.hg.1 TC0800006669.hg.1 101929128 LOC101929128                      uncharacterized LOC101929128
                       logFC  AveExpr         t      P.Value adj.P.Val         B
TC0100018466.hg.1 -0.9234392 2.953739 -5.172051 7.563992e-05 0.5026625 -2.663679
TC1700010560.hg.1 -0.5740328 5.817980 -5.108468 8.634638e-05 0.5026625 -2.687026
TC0X00009878.hg.1 -0.6891868 2.370178 -4.841589 1.512198e-04 0.7234822 -2.788913
TC0800009830.hg.1 -0.7199828 4.912157 -4.766760 1.771752e-04 0.7288260 -2.818622
TC1300006529.hg.1  0.6652065 3.909062  4.466342 3.363292e-04 0.8149338 -2.942987
TC0800006669.hg.1 -0.6809704 2.177349 -4.410742 3.789860e-04 0.8149338 -2.966899

What confused me is that I want to do a differential expression analysis for my 3 groups first and do a Gene Expression Trend Analysis to find if there is any KEGG pathway or GO enrichment has my interest trend among these 3 group. But there shows no different and I still dont know why. Could you please give me some suggestion to help me with such an issue? Millions and billions thanks to u!

ADD REPLY • link 4.9 years ago by jiamacro ▴ 20

1

Entering edit mode

In the first example, you are using decideTests, which has a default threshold of Benjamini-Hochberg adjusted p-value of 0.05. Looking in your second example, no gene passes this threshold.

For KEGG pathway enrichment, you can use KEGGprofile

ADD REPLY • link 4.9 years ago by Kevin Blighe 87k

0

Entering edit mode

Does it mean my data has nothing significant or I can use P value instead? Thank u.

ADD REPLY • link 4.9 years ago by jiamacro ▴ 20

0

Entering edit mode

How are the results in your other tables:

DEG_CAD_SCH
DEG_Between

??

Your sample n is also small. Can you combine any groups together?

ADD REPLY • link 4.9 years ago by Kevin Blighe 87k

0

Entering edit mode

head (DEG_Between)
                            PROBEID ENTREZID   SYMBOL                                           GENENAME      logFC
TC1400010800.hg.1 TC1400010800.hg.1     3502    IGHG3 immunoglobulin heavy constant gamma 3 (G3m marker)  2.0587554
TC0600013780.hg.1 TC0600013780.hg.1    80350    LPAL2                  lipoprotein(a) like 2, pseudogene -0.7468014
TC0800008848.hg.1 TC0800008848.hg.1     5820     PVT1                                      Pvt1 oncogene  0.9172486
TC0900008737.hg.1 TC0900008737.hg.1   406956 MIR181B2                                    microRNA 181b-2 -0.7549605
TC1300008614.hg.1 TC1300008614.hg.1    23111    SPART                                            spartin -0.7537272
TC0800009830.hg.1 TC0800009830.hg.1    80346    REEP4                       receptor accessory protein 4  0.7213144
                   AveExpr         t      P.Value adj.P.Val         B
TC1400010800.hg.1 7.775120  5.104049 8.714598e-05 0.9995631 -3.127323
TC0600013780.hg.1 5.242101 -5.022495 1.033448e-04 0.9995631 -3.150102
TC0800008848.hg.1 4.528440  4.965529 1.164634e-04 0.9995631 -3.166285
TC0900008737.hg.1 3.489531 -4.809214 1.619363e-04 0.9995631 -3.211854
TC1300008614.hg.1 6.271769 -4.708515 2.004973e-04 0.9995631 -3.242123
TC0800009830.hg.1 4.912157  4.502456 3.112667e-04 0.9995631 -3.306313

head (DEG_CAD_SCH)
                            PROBEID  ENTREZID       SYMBOL                                          GENENAME
TC1600007575.hg.1 TC1600007575.hg.1    729264     TP53TG3D                                    TP53 target 3D
TC0200015757.hg.1 TC0200015757.hg.1    255101       CFAP65          cilia and flagella associated protein 65
TC0300008985.hg.1 TC0300008985.hg.1    287015       TRIM42                    tripartite motif containing 42
TC1500007392.hg.1 TC1500007392.hg.1      9133        CCNB2                                         cyclin B2
TC0800006669.hg.1 TC0800006669.hg.1 101929128 LOC101929128                      uncharacterized LOC101929128
TC0200013638.hg.1 TC0200013638.hg.1    164832       LONRF2 LON peptidase N-terminal domain and ring finger 2
                       logFC  AveExpr         t      P.Value adj.P.Val         B
TC1600007575.hg.1 -1.0040570 2.669795 -5.341711 5.324979e-05 0.7944573 -3.029719
TC0200015757.hg.1  0.7597291 2.212664  4.842190 1.510280e-04 0.7944573 -3.171664
TC0300008985.hg.1 -0.7032285 2.632224 -4.759915 1.797660e-04 0.7944573 -3.196736
TC1500007392.hg.1  1.1560889 3.275195  4.571321 2.686107e-04 0.9807471 -3.256060
TC0800006669.hg.1 -0.7230881 2.177349 -4.415688 3.749782e-04 0.9998860 -3.306972
TC0200013638.hg.1  0.5935299 3.524997  4.363952 4.191192e-04 0.9998860 -3.324288

I have 14 samples which are divided into three groups with different conditions. They are not cancer patients and I find there is not such big differences between their expression level. Also after annotation, I found only 25000 probes have their symbol. So I consider it as that’s the true condition of my patients.

ADD REPLY • link 4.9 years ago by jiamacro ▴ 20

0

Entering edit mode

Okay, so, you were expecting there to not be any differentially expressed genes? - that may be the genuine result, and informative in its own way.

ADD REPLY • link 4.9 years ago by Kevin Blighe 87k

0

Entering edit mode

Sounds so sad. But I do realize that FC value show some difference and the p value itself can be significant or meaningful. How do you think about I don’t use adj.P.value instead of p value? Thank u.

ADD REPLY • link 4.9 years ago by jiamacro ▴ 20

0

Entering edit mode

For what is the work? - thesis?; publication?; patent?; something else? You can always report that the conditions were only differentially expressed based on a nominal p-value cut-off. The log [base 2] fold changes are low, too.

There is no result that is a bad result.

ADD REPLY • link 4.9 years ago by Kevin Blighe 87k

0

Entering edit mode

If I want to use my P.value to make a Venn diagram, I have to show the results of expression by P.value instead of adj.P.value. But the summary(results) returns no significantly different.

summary (results)
  CADvControl CAD_SCHvControl CAD_SCHvCAD
Down             0               0           0
NotSig      138745          138745      138745
Up               0               0           0

Which argument can I change to get such a result? Millions and billions thanks to u!

ADD REPLY • link 4.9 years ago by jiamacro ▴ 20

1

Entering edit mode

You can type ?decideTests at the command line to bring up the manual page. Upon reading the manual page, you will find that all that you need to do is:

decideTests(fit2, adjust.method = "none", p.value = 0.05, lfc = 0)

Then, the threshold will be nominal p<0.05

ADD REPLY • link 4.9 years ago by Kevin Blighe 87k

0

Entering edit mode

Sorry for so late to replay. Your suggestion is very useful! Now I have the differential expression results and I can do GO and KEGG analysis. But 1) How can I identify the lncRNA, miRNA and circRNA from my data? My microarray can measure them at same time, while after annotation, I didn't get a result with chromesome location or RNA type of those RNAs. 2) I am going to do GO and KEGG analysis, shall I identify and separate mRNAs and only use them? Or use the "raw results"? 3) The results I have now are based on contrast matrix and are two-two comparison results, AND I also need a 3-group DEG trend analysis to find if my interesting gene has a specific trend among these 3 groups, what can I do? THANK U.

ADD REPLY • link 4.8 years ago by jiamacro ▴ 20

0

Entering edit mode

Hey, you can use biomaRt to automate the lookup of each gene's 'biotype'. Here is an example:

genes <- c("XIST", "BRCA1", "MALAT1", "AFG3L1P")

require("biomaRt")
mart <- useMart("ENSEMBL_MART_ENSEMBL")
mart <- useDataset("hsapiens_gene_ensembl", mart)
annotLookup <- getBM(
  mart = mart,
  attributes = c(
    "hgnc_symbol",
    "ensembl_gene_id",
    "gene_biotype",
    "external_gene_name"),
  filter = "hgnc_symbol",
  values = genes,
  uniqueRows=TRUE)

annotLookup
  hgnc_symbol ensembl_gene_id                   gene_biotype external_gene_name
1     AFG3L1P ENSG00000223959 transcribed_unitary_pseudogene            AFG3L1P
2       BRCA1 ENSG00000012048                 protein_coding              BRCA1
3      MALAT1 ENSG00000251562                        lincRNA             MALAT1
4        XIST ENSG00000229807                        lincRNA               XIST

NB - biomaRt will NOT necessarily return your genes in the same order in which they were submitted.

--------------------------------------

For your second question, you can use all genes but I would convert them to Entrez IDs (via biomaRt) before doing this.

ADD REPLY • link 4.8 years ago by Kevin Blighe 87k