I made a wgcna std analysis with RNASeq normalized data (log2). I setup a Signed-Ntw with dynamic cut off (Pearson) according with most of the recommendations.
At the beginning I didn't get a good correlation score regarding with the scale-free topology for dynamic cut-off soft thresholds. So, to reach this goal, I built several matrices in which basically I applied a data-cutoff based on the quartiles (stats) until I reached a decent correlation score (0.82)., During this process obviously the original matrix reduced its size (from ~ 27000 genes to ~ 2000) in 17 samples. In theory this is right because I just want to keep the highest expression scores, but mathematically I am not sure if I am biasing the experiment applying this criteria.
Thus, my question is if when performing several cutoff to a data-matrix until get the desire behavior is a normal practice? or, am I biasing the experiment? ... The think is, that at the end I have proper results, but I want to be sure that these results are also valid.
I highly appreciate any comment.
These are distributions on each cut off for your reference