Replace Drugbank IDs with Drug name [R]


Hello friends,

I have a dataset of genes associated with drugs derived from DrugBank.
I wish to simply translate all the drugbank IDs to drug names readable by a human.

The drugbank vocabulary (.csv) looks like this:


DB00001 Lepirudin

DB00002 Cetuximab

DB00003 Dornase alfa

DB00004 Denileukin diftitox

DB00005 Etanercept

DB00006 Bivalirudin

My dataset (.csv) has 15 columns but the important ones are:


Gene.Name DrugBank.ID

F8 DB09130

TCN2 DB00200

LDLR DB09270; DB11251; DB14003

ALB DB00070; DB00137; DB00159; DB00162; DB00214;

As you can see some genes are linked to multiple or even hundreds of drugs. the multiple Drug IDs are separated by semicolons, in the same comma-delimited "column"
The R studio match or merge function only work for the first identifier in each column, thus effectively deleting the remainder in the same column "cell".

Any advice is welcome, thanks in advance!





Source link