As indicated in the previous reply by @LAP you can use the terms() from these models. However, I would recommend to look at the attr(..., "factors") and attr(..., "dataClasses") instead of going to the $model which requires that the entire model.frame() is stored in the model. This may or may not be the case. Specifically, when re-fitting multiple models you might want to be able to not store the model frame each time.
So one idea would be to proceed in the following steps:
- Check whether
attr(..., "factors") has not exactly one column, the you can return FALSE.
- If there is exactly one factor, you can check the corresponding
attr(..., "dataClasses") if it is "factor"/"ordered" and then return TRUE, otherwise FALSE.
R code:
one_factor <- function(object) {
f <- attr(terms(object), "factors")
if(length(f) == 0L || NCOL(f) != 1L) return(FALSE)
d <- attr(terms(object), "dataClasses")
if(d[colnames(f)] %in% c("ordered", "factor")) {
return(TRUE)
} else {
return(FALSE)
}
}
This appears to work well for single-part formula-based objects.
Dummy data with numeric/factor/ordered trt:
d1 <- d2 <- d3 <- data.frame(y = log(1:9), x = 1:9, trt = rep(1:3, each = 3))
d2$trt <- factor(d2$trt)
d3$trt <- ordered(d3$trt)
Various formula specifications:
f <- list(
y ~ 1,
y ~ x,
y ~ trt,
y ~ trt + x,
y ~ trt + offset(x),
y ~ trt + x + offset(x),
y ~ trt + offset(as.numeric(trt)),
y ~ factor(trt),
y ~ factor(trt) + offset(x),
y ~ factor(x > as.numeric(trt)),
y ~ interaction(x, trt),
y ~ 0 + trt
)
Expected results for d1, d2, and d3, respectively:
ok1 <- c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, FALSE)
ok2 <- c(FALSE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE)
ok3 <- ok2
Checks for lm without storing the model frame:
lm1 <- lapply(f, lm, data = d1, model = FALSE)
identical(sapply(lm1, one_factor), ok1)
## [1] TRUE
lm2 <- lapply(f, lm, data = d2, model = FALSE)
identical(sapply(lm2, one_factor), ok2)
## [1] TRUE
lm3 <- lapply(f, lm, data = d3, model = FALSE)
identical(sapply(lm3, one_factor), ok3)
## [1] TRUE
Checks for survreg (Gaussian) and coxph. (The latter throws a lot of warnings about non-convergence which is not surprising given the dummy data structure. The checks still work as intended.)
library("survival")
d1$y <- d2$y <- d3$y <- Surv(d1$y + 0.5)
sr1 <- lapply(f, survreg, data = d1)
identical(sapply(sr1, one_factor), ok1)
## [1] TRUE
sr2 <- lapply(f, survreg, data = d2)
identical(sapply(sr2, one_factor), ok2)
## [1] TRUE
sr3 <- lapply(f, survreg, data = d3)
identical(sapply(sr3, one_factor), ok3)
## [1] TRUE
cph1 <- lapply(f, coxph, data = d1)
identical(sapply(cph1, one_factor), ok1)
## [1] TRUE
cph2 <- lapply(f, coxph, data = d2)
identical(sapply(cph2, one_factor), ok2)
## [1] TRUE
cph3 <- lapply(f, coxph, data = d3)
identical(sapply(cph3, one_factor), ok3)
## [1] TRUE
Note: If you have multi-part Formula-based objects this function might fail and the underlying tests would need to be adapted. Examples for the latter could include count regression models (zeroinfl, hurdle), multinomial logit (mlogit), instrumental variables (ivreg), heteroscedastic models (vglm, betareg, crch) etc. These might have formulas like y ~ trt | 1 or y ~ trt | trt or y ~ trt | x which may or may not still be feasible in your framework.