Parsing Dates from Multiple Websites

0

Hi Everyone!!

I have 100 COVID databases from where I wish to enlist the date of update. Some are regularly updated but some aren't.

I am trying to do this in R. Example given below. I wish to write a function that I can run on those 100 links weekly to record when were those 100 databases updated.


library(stringr)
library(rvest)
library(lubridate)

html <- readLines("https://grafnet.kaust.edu.sa/assayM/")
t <- html[grep(pattern = "(?i)(last update | update | updated)", x = html, ignore.case = TRUE)]
cleanFun <- function(htmlString) {
  return(gsub("<.*?>", "", htmlString))
}
t <- cleanFun(t)

dmy(t)

I can't move beyond this step. I know that if the string gets smaller I can use dmy function of lubridate. But I am poor at regex. can someone help?


R

• 52 views



Source link