gravatar for casarellacecilia

2 hours ago by

Hi, I am new in this field and I am having some problems regarding a new project.

I built a graph using Drugbank Data connected to SIDER Adverse Reactions.
I used Organ- level Terms to classify the ADRs in 25 different groups.
I aim to use the algorithm Metapath2vec in order to cluster drugs based on the group of ADRs they lead to.
I am currently using the python package for Machine Learning on graphs Stellargraph and the related implementation of Metapath2vec.

I chose Clustering as downstream task with the resulting node embeddings, e.g. DBSCAN but the results are not promising.
I would like to have some clusters of drugs related to different adverse drug reaction.
Since I am new in this field everything I am trying is based on scientific literature but I don't know if this is the right approach for my objective.

This is the code related to the metapath2vec algorithm:

walk_length = 100  
# maximum length of a random walk to use throughout this notebook

specify the metapath schemas as a list of lists of node types.
metapaths = [
    ["drug", "adr", "drug"],
    ["drug", "adr", "drug", "drug"],
    ["drug", "drug"],
    ["drug", "adr", "group_adr", "adr", "drug"],

# Create the random walker
rw = UniformRandomMetaPathWalk(graph)

walks =
    nodes=list(graph.nodes()),  # root nodes
    length=walk_length,  # maximum length of a random walk
    n=1,  # number of random walks per root node
    metapaths=metapaths,  # the metapaths

from gensim.models import Word2Vec

model = Word2Vec(walks, size=128, window=5, min_count=0, sg=1, workers=2, iter=1)

Should I change approach?
what other algorithm for representation learning can I use in order to reach my goal?
what can I improve in the presented approach to have better node embeddings?

Source link