For a dataset I want to use xgboost for the optimal ensembling of $n$ forecasts instead of just using their arithmetic mean for combination. I found that xgboost generates forecasts that are worse than many of the $n$ individual forecasts the moedl could choose for combination.
I do not know why this can be the case. For the illustration of my observation I created the toy dataset below. The artificial target variable is generated by $$y = \frac{x_1+x_2}{2} \, \,\mbox{with } x_1, x_2 \sim N(0,1) $$ Given the deterministic relationship between $y$ and the two explanatory variables $x_1$ and $x_2$, xgboost could make perfect forecasts, but it does not. The linear model easily does. Since this is the most simple multivariate linear regression model I can think of and xgboost fails I wondering about the implications.
- Why is this the case? What are the limitations of tree models for regression?
- Why is then xgboost used for stacking and ensembling of forecasts if it cannot reproduce the MSE minimizing arithmetic mean as optimal combination mechanism?
Note that the parameters of xgboost do not affect that. I tried many parameter settings and the results are never perfect.
Data Generation
library(tidyverse)
library(xgboost)
n <- 1000
param0 <- list("objective" = "reg:linear", "eval_metric" = "rmse")
set.seed(1)
df <- tibble(x1 = rnorm(n), x2 = rnorm(n), y = (x1+x2)/2)
xgboost
xgtrain <- xgb.DMatrix(as.matrix(df[1:900,c("x1","x2")]), label = df$y[1:900], missing = NA)
xgtest <- xgb.DMatrix(as.matrix(df[901:1000,c("x1","x2")]), missing = NA)
#Crossvalidation just to illustrate that the algorithm
#learns something that is not correct since the test data
#cannot be forecasted with 0 error.
#xgb.cv(nrounds = 100,nfold = 10, params = param0, data = xgtrain)
#nrounds and other parameters do not not get you to the prefect forecast
model <- xgb.train(nrounds = 100, params = param0, data = xgtrain)
preds_xgb <- predict(model, xgtest)
#no perfect forecasts
sqrt(mean((preds_xgb-df$y[901:1000])^2))
0.04654448
Linear regression
model <- lm(y ~ x1+x2, data = df[1:900,])
#0.5 and 0.5 for x1 and x2 as expected
model$coefficients
preds_lm <- predict(model, df[901:1000,c("x1","x2")])
#perfect forecasts
sqrt(mean((preds_lm-df$y[901:1000])^2))
1.389314e-15