Pick soft threshold for co-expression analysis and filtering step
1
0
Entering edit mode
5.3 years ago
Biologist ▴ 290

I have 19803 genes and 201 samples with raw counts data. I have used DEseq2 package to get normalised counts which is given as input to WGCNA.

Before that I have using filtering step in deseq2 package, with which I kept genes only with more than 20 counts in total for each gene. With this I have 18000 genes for WGCNA.

Scale free topology fit index gave a plot like below:

enter image description here

1) From that do I need to take softPower 6 or 7?

2) Is there a way to reduce the number of input genes with some other strict filtering? [Ofcourse I could take only top 50% variable genes and do co-expression analysis, but like that the gene I'm interested in filtered out]

RNA-Seq wgcna coexpression r network • 2.7k views
ADD COMMENT
3
Entering edit mode
5.3 years ago

1) From that do I need to take softPower 6 or 7?

Take 6. In my former supervisor's words: "generally, the first past 0.9" - she teaches WGCNA and works in the lab where the developer used to be based.

2) Is there a way to reduce the number of input genes with some other strict filtering? [Ofcourse I could take only top 50% variable genes and do co-expression analysis, but like that the gene I'm interested in filtered out]

Indeed, reducing variables based on low variance is another option; however, the genes of low variance may actually be of interest to a network analysis. Why not just continue with the 18000, provided that there is no computational / infrastructural issue in doing this.

You may also want to try the analysis with the log-transformed normalised counts, by the way.

Kevin

ADD COMMENT
0
Entering edit mode

Thanks a lot Kevin. Sure I will also try with log transformed normalised counts. I see in Deseq2 tutorial there are two types rlog and vst for log transformed values. Which one should I use? deseq2 tutorial

ADD REPLY
1
Entering edit mode

Hey, there is no real preference. I would start with vst and then also try with rlog after. Hopefully, results will be similar.

ADD REPLY
2
Entering edit mode

Depends also on the number of samples. If you have many (like > 50) rlog might take several hours to complete because it fits a shrinkage term for every sample which vst doesn't.

ADD REPLY
0
Entering edit mode

I have tried in both ways with normalised counts and also vst log transformed normalized values. In both the ways I had set minimum module size 50.

With normalized counts - I see 56 modules. with log transformed vales - I see 14 modules.

With log transformed values data, I took the soft power = 3, based on following plot where Rsquare is > 0.8

enter image description here

Among all the modules I'm interested in the module where my interested gene is. I see in both the ways the module where my interested gene belongs have similar number of genes.

But which way is better do you think? only normalized or log transformed normalized?

ADD REPLY
1
Entering edit mode

Well, this is one of the issues with network analysis approaches. Although the developer of WGCNA implies in one moment that your input data is not critical so long as it is normalised and that all samples are simply processed in the same way, in practice, results are highly variable, and I frequently see people banging their head against a brick wall trying to interpret the results from WGCNA. This is why I specifically never use WGCNA anymore (unless instructed).

In all honesty, I cannot answer your question. To make it easier, I would suggest using vst counts and taking the 14 modules. This is the same recommendation in the contradicting FAQ:

As far as WGCNA is concerned, working with (properly normalized) RNA-seq data isn't really any different from working with (properly normalized) microarray data. ... We then recommend a variance-stabilizing transformation.

...

Whether one uses RPKM, FPKM, or simply normalized counts doesn't make a whole lot of difference for WGCNA analysis as long as all samples were processed the same way. These normalization methods make a big difference if one wants to compare expression of gene A to expression of gene B; but WGCNA calculates correlations for which gene-wise scaling factors make no difference. (Sample-wise scaling factors of course do, so samples do need to be normalized.)

[source: https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/faq.html]

ADD REPLY
0
Entering edit mode

Thank you very much for the link.

ADD REPLY
0
Entering edit mode

In your figure, which is first above 0.9, though? It looks like it may be 6 or 7

ADD REPLY
0
Entering edit mode

@Kevin Blighe In the second figure in my comments, I see 3 is above R square 0.8. So, I took softPower 3 when using log transformed values for WGCNA.

Do you think this is right? Or should I use every time 6 or 7 as softPower?

In most of the tutorials I see they are using 7 or 8.

See in this tutorial I see 5 is above 0.8 and they took 5 as softpower [https://github.com/hms-dbmi/scw/blob/master/scw2016/tutorials/wgcna/WGCNA.md]

In this they took 8 as softPower [https://hms-dbmi.github.io/scw/WGCNA.html]

In this 7 as softPower [http://pklab.med.harvard.edu/scw2014/WGCNA.html]

ADD REPLY
1
Entering edit mode

You should generally choose the first soft power that passes 0.9 (not 0.8). This is usually 6 or 7 in most datasets.

ADD REPLY
1
Entering edit mode
ADD REPLY
1
Entering edit mode

thanks a lot for the link Kevin

ADD REPLY
0
Entering edit mode

sure. thank you for the quick reply.

ADD REPLY

Login before adding your answer.

Traffic: 2632 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6