gravatar for camillab.

2 hours ago by

London

Hi!

I am trying to perform linear regression for age and sex on a dataset with 6 samples and 16757 genes creating a loop.
this is my dataset ( I copied the first columns):

'data.frame':   6 obs. of  16757 variables:
 $ samples               : chr  "hu-c_lab13" "hu-c_lab15" "hu-c_lab17" "hu-gent_lab14" ...
 $ treatment             : chr  "untreated" "untreated" "untreated" "treatment" ...
 $ sex                   : chr  "Male" "Female" "Male" "Female" ...
 $ age                   : num  45 56 46 65 21 75
 $ 7SK (i)               : num  87779 79828 64005 44973 42646 ...

I want to do a loop to identify if age and sex affect the gene expression and I wanted to obtain the fitted.values

prova$treatment <- factor(prova$treatment, levels=c("treatment","untreated"))
prova$sex <- factor(prova$sex, levels=c("Female","Male"))
prova$age <- as.numeric(prova$age)

genelist <- prova %>% select(5:16757) #select genes

for (i in 1:length(genelist)) {
  formula <- as.formula(paste("samples ~ ", genelist[i], " + age + sex ", sep=""))
  model <- glm(formula, data = prova)
  print(model[["fitted.values"]])
}

but it gives me

Error in y - mu : non-numeric argument to binary operator

what do I do wrong in the loop?

also if I do for single gene it works:

model2 <- lm(ENSG00000202198 ~ sex + age , data=prova)
summary(model2)
model$fitted.values <- predict(model2)
gene <- model2[["fitted.values"]]
gene  <- as.data.frame(gene)

Thank you

Camilla



Source link