Hello everyone, i wanted to ask that i am trying to perform network analysis of differentially expressed genes (homo sapiens) which are more than 2000 in number by using STRING interactions and visualising them in cytoscape. But the problem i am facing is that, STRING database is not taking genes more than 2000 at a time.

  1. How to perform network analysis for more than 2k genes?

  2. Also on running initial 2k gene, few genes which are human genes, are not being recognised by STRING database for homo sapiens. Why is it so?

Since you requested that all of us help you out, I feel compelled to tell you what I know even though it may not be directly answering all your questions.

There is a reason why STRING limits the network size to 2000. Most biological networks are smaller than that - at least in a local sense, as ultimately everything is connected. Separately, visualizing a network of that size would be very difficult down the road. You did not tell us what were the parameters used to get this group of genes (log2FC, p-values), but I am guessing that you didn't set them very strictly. If you use more stringent values (e.g., set p-values to 0.01 instead or 0.05, or log2FC to 2 instead of 1), chances are that you will get fewer than 2000 proteins to work with. This will possibly cut down the false positives that always happen when working with a network of this size.

STRING is interfaced with many databases and is able to recognize most gene names, but that will never be perfect - and more so for 2000+ names. You can try to find synonyms for the genes that are not recognized, or submit protein sequences instead of names.

