I have looked through the many existing posts on soft power selection in WGCNA, but unfortunately wasn't able to determine a solution to my problem. In brief, I cannot achieve a signed scale free topology R^2 of 0.8 or higher without having a very high soft power. I am conducting an exploratory analysis of the gene expression data for the skeletal muscle samples. To summarize, this is what I have done:
- Imported the public GTEx TPM data, selected just the skeletal muscle data, and normalized via log2(TPM+1); total genes = 56,200, samples = 803.
Excluded all genes with near 0 variance and those with mean log2(TPM+1) <= 0.5, on the basis of this histogram, leaving 16,089 genes, 803 samples:
Computed the estimated soft power (signed network) on the remaining genes and plotted as usual:
At this point you can see that I need a power of 26 to even hit 0.8 on the measure of scale free topology, and the connectivity has dropped off a fair bit by then. So I started wondering what global drivers of gene expression might exist (as discussed in the WGCNA FAQ and elsewhere), and how to deal with them. I plotted the dendrogram along with a trait heatmap for any trait info I thought might be relevant. Sample clustering is by average Euclidean distance after the log2(TPM+1) transform:
As you can see, there are some definite clusters and it looks like they may be related to the terminal phase duration (Hardy score) and the tissue ischemia time, which each overlap quite a bit. The turquoise bands in the Hardy score represent the ventilated subgroup, so it's sadly not surprising that they have the lowest ischemia time. Having said all that, this kind of analysis is new to me, so I'm not sure how to adjust for these factors, which likely(?) are responsible for the high soft power. I tried re-running the soft-power calculations for just the ventilated subgroup, but didn't get significantly different results.
Thanks to anyone that read this far.. I'm not averse to creating multiple networks, but I'd like to have confidence in selecting my soft-power(s). I am considering a soft power of 12-16, as they are near the recommended sample size of 12, and while they have low signed scale free topology R^2 values, the mean and median connectivity values look o.k. Alternatively, I could use a soft power of 26, which gets the signed scale free topology R^2 up to ~0.8, but lowers the connectivity considerably.
I'd appreciate any input as far as a specific power to select, or other things to explore as far as correcting for covariates, etc.