Hi, I am new in this field and I am having some problems regarding a new project.
I built a graph using Drugbank Data connected to SIDER Adverse Reactions.
I used Organ- level Terms to classify the ADRs in 25 different groups.
I aim to use the algorithm Metapath2vec in order to cluster drugs based on the group of ADRs they lead to.
I am currently using the python package for Machine Learning on graphs Stellargraph and the related implementation of Metapath2vec.
I chose Clustering as downstream task with the resulting node embeddings, e.g. DBSCAN but the results are not promising.
I would like to have some clusters of drugs related to different adverse drug reaction.
Since I am new in this field everything I am trying is based on scientific literature but I don't know if this is the right approach for my objective.
This is the code related to the metapath2vec algorithm:
walk_length = 100 # maximum length of a random walk to use throughout this notebook specify the metapath schemas as a list of lists of node types. metapaths = [ ["drug", "adr", "drug"], ["drug", "adr", "drug", "drug"], ["drug", "drug"], ["drug", "adr", "group_adr", "adr", "drug"], ] # Create the random walker rw = UniformRandomMetaPathWalk(graph) walks = rw.run( nodes=list(graph.nodes()), # root nodes length=walk_length, # maximum length of a random walk n=1, # number of random walks per root node metapaths=metapaths, # the metapaths ) from gensim.models import Word2Vec model = Word2Vec(walks, size=128, window=5, min_count=0, sg=1, workers=2, iter=1)
Should I change approach?
what other algorithm for representation learning can I use in order to reach my goal?
what can I improve in the presented approach to have better node embeddings?