Biostar Beta. Not for public use.
Question: WGCNA pearson or bicor? Also, should I adjust module correlation Pvalues by Bonferroni?
0
Entering edit mode

Hello!

I am performing a network analysis on WGCNA. I had to separate the samples into 2 networks (depending on the tissue) due to the high variation between tissues. Each network is made by 15 samples (which is supposed to be the minimun acceptable).

In FAQ of WGCNA it is recommended to use bicor correlation rather than pearson correlation. However my data from the ovaries seems to perform better with pearson correlation. (There isnt much difference on the samples from the body).

I send attached the graphs of scale independence of the ovary network when using pearson and bicor.

Pearson

Pearson

Bicor

Bicor

In this case, should I use pearson? or use bicor even if the adjustment to the scale independence seems to be worse?

Also, from the module-trait graph I obtain certain Pvalues associated to each module and some of them are lower than 0.05. However if I adjust the Pvalue by bonferroni (or any other method) and the total number of modules, none of them becomes significant. Does it mean my data is useless? Can I use in further analysis the modules that are significantly associated to any of the traits with uncorrected Pvalues? I want to check for TF associated to the genes in each module to look for regulators of the associated traits (fecundity and lifespan). Should I modify any parameter to make them most significant? By now I have normalized the data (RNAseq data) according to what is said on FAQ on WGCNA page, I choose the soft threshold highlighted by the function picksoftthreshold and I am using signed hybrid networks. All the other parameters are the same than the default ones choosen on the tutorials.

Ovary network modules-trait

Thank you very much!

ADD COMMENTlink 18 months ago jagoor93 • 0 • updated 18 months ago Kevin Blighe 43k
Entering edit mode
0

Hello jagoor93,

The link you’ve added points to the page that contains the image, not the image itself. On ibb.co site, scroll down and look for a tab that says Embed codes. Click on this Embed codes tab. Copy the code in the HTML full image box. Post that line into your post here (instead of the link you've used) to parse the image in automatically.

I've corrected this for you on this occasion.

ADD REPLYlink 18 months ago
andrew.j.skelton73
5.7k
Entering edit mode
0

Thank you very much!

ADD REPLYlink 18 months ago
jagoor93
• 0
1
Entering edit mode

I agree that Pearson looks better, particularly on the Scale Independence plot.

Also, from the module-trait graph I obtain certain Pvalues associated to each module and some of them are lower than 0.05. However if I adjust the Pvalue by bonferroni (or any other method) and the total number of modules, none of them becomes significant. Does it mean my data is useless? Can I use in further analysis the modules that are significantly associated to any of the traits with uncorrected Pvalues?

It does not mean that your data is useless. We faced the same issue using WGCNA in the lab in Boston (USA). We were eventually able to publish the data with the nominal (unadjusted) P values. Bonferroni correction is the most stringent P value adjustment, though - why not try with Benjamini-Hochberg?

I want to check for TF associated to the genes in each module to look for regulators of the associated traits (fecundity and lifespan). Should I modify any parameter to make them most significant?

You could modify the tree cut height, which will affect the final number of modules, which, in turn, will affect the P value adjustment. You can also filter out genes before performing WGCNA, like genes of low expression and/or genes of high variance.

Kevin

ADD COMMENTlink 18 months ago Kevin Blighe 43k
Entering edit mode
0

Thanks! That is really helpful. With BH the results are still not significant, but I checked the TF with enriched binding sites and foxo and some GATA genes associated with ageing were at the top when analysing the module associated with lifespan. So I feel confident at least about that module.

I modified tree cut height and there were less modules but less significant and I lost most of the TF associated with ageing, so I dont think I would modify that parameter.

And also, which program can I use to filter by high variance? By now I filtered by low expression and performed log2(fpkm +1) and quantile normalization.

Thank you very much

Javier

ADD REPLYlink 18 months ago
jagoor93
• 0
Entering edit mode
2

Hey Javier, well, FPKM data is not great - logged FPKM data is neither great. From where did you obtain this data? For differential expression analysis, one should not use FPKM; however, for network analysis, its not the 'end of the World' to use this type of data because network analysis tools like WGCNA are based on correlation.

If you could obtain another form of normalised counts for your data, that may improve the situation. Otherwise, you can remove low variance data as follows:

# generate random data, and log (base 2) this
mat <- log2(matrix(rexp(200, rate=.1), ncol=20))
dim(mat)
[1] 10 20

# obtain the variance of each row (gene)
variances <- apply(mat, 1, function(x) var(x, na.rm=TRUE))

# determine the variance range 
rangeVar <- max(variances) - min(variances)

# keep only those genes whose variance is greater than one-third the variance range
keep <- variances > (rangeVar / 3)
dim(mat[keep,])
[1]  9 20

There are different ways of filtering on variance, though.

ADD REPLYlink 18 months ago
Kevin Blighe
43k
Entering edit mode
0

Hi Kevin,

Oh, sorry, I meant I am using log FPKM for the network analysis (It is recommended on WGCNA webpage when working with RNAseq data).

For Diff Expression I am using raw counts as recommended by DESeq2 package

Anyway, I will select the genes with high variance for the network analysis and see if my data improves.

With the parameters I have been using I already obtained really interesting results. The top TF with enriched binding sites on the modules of lifepan are associated with ageing (foxo and some GATA TFs) and the top network bottlenecks and hubs I obtained are inhibitors of mTOR. So I am really exited about it.

Thank you very much!

Javier

ADD REPLYlink 18 months ago
jagoor93
• 0
Entering edit mode
0

Okay, seems promising / me parece alentadora.

ADD REPLYlink 18 months ago
Kevin Blighe
43k
Entering edit mode
0

the object formed is logical , its a R question plus bit of simple maths why not go for std dev instead of variance ? like can i consider genes that falls within my first two Std deviation ?

i would be glad if you can bit explain more

when i see the object keep its says logical , how do you subset those genes that comes from keep from my main object..like mat?

ADD REPLYlink 18 months ago
krushnach80
• 500
Entering edit mode
0

Yes, keep will be TRUE or FALSE, relating to the original rows (genes) whose variances were greater than the lower third of the overall variance range.

I would encourage you to check the output of each of my commands (above) in order to be sure about what is happening (?)

ADD REPLYlink 18 months ago
Kevin Blighe
43k

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0