XGBoost custom objective for regression in R

Question

I implemented a custom objective and metric for a xgboost regression. In order to see if I'm doing this correctly, I started with a quadratic loss. The implementation seems to work well, but I cannot reproduce the results from a standard "reg:squarederror" objective.

Question:

I wonder if my current approach is correct (especially the implementation of the first and second order gradient)? If so, what could be a possible reason for the difference?

Gradient and Hessian are defined as:

grad <- 2*(preds-labels) 
hess <- rep(2, length(labels))

Minimal example (in R):

library(ISLR)
library(xgboost)
library(tidyverse)
library(Metrics)
Data
df = ISLR::Hitters %>% select(Salary,AtBat,Hits,HmRun,Runs,RBI,Walks,Years,CAtBat,CHits,CHmRun,CRuns,CRBI,CWalks,PutOuts,Assists,Errors)
df = df[complete.cases(df),]
train = df[1:150,]
test = df[151:nrow(df),]
XGBoost Matrix
dtrain <- xgb.DMatrix(data=as.matrix(train[,-1]),label=as.matrix(train[,1]))
dtest <- xgb.DMatrix(data=as.matrix(test[,-1]),label=as.matrix(test[,1]))
watchlist <- list(eval = dtest)
Custom objective function (squared error)
myobjective <- function(preds, dtrain) {
  labels <- getinfo(dtrain, "label")
  grad <- 2*(preds-labels)
  hess <- rep(2, length(labels))
  return(list(grad = grad, hess = hess))
}
Custom Metric
evalerror <- function(preds, dtrain) {
  labels <- getinfo(dtrain, "label")
  u = (preds-labels)^2
  err <- sqrt((sum(u) / length(u)))
  return(list(metric = "MyError", value = err))
}
Model Parameter
param1 <- list(booster = 'gbtree'
              , learning_rate = 0.1
              , objective = myobjective 
              , eval_metric = evalerror
              , set.seed = 2020)
Train Model
xgb1 <- xgb.train(params = param1
                 , data = dtrain
                 , nrounds = 500
                 , watchlist
                 , maximize = FALSE
                 , early_stopping_rounds = 5)
Predict
pred1 = predict(xgb1, dtest)
mae1 = mae(test$Salary, pred1)
XGB Model with standard loss/metric
Model Parameter
param2 <- list(booster = 'gbtree'
              , learning_rate = 0.1
              , objective = "reg:squarederror"
              , set.seed = 2020)
Train Model
xgb2 <- xgb.train(params = param2
                 , data = dtrain
                 , nrounds = 500
                 , watchlist
                 , maximize = FALSE
                 , early_stopping_rounds = 5)
Predict
pred2 = predict(xgb2, dtest)
mae2 = mae(test$Salary, pred2)

Results:

The custom metric yields a slightly better result MAE=199.6 compared to the standard objective MAE=203.3.
During boosting, the RMSE tends to be lower with the custom objective.

For the custom objective the RMSE is:

[1] eval-MyError:599.490030 
[2] eval-MyError:560.677996 
[3] eval-MyError:527.867686
[4] eval-MyError:498.216760 
[5] eval-MyError:472.167415 
...

For the standard objective the RMSE is:

[1] eval-rmse:598.144775 
[2] eval-rmse:562.479431 
[3] eval-rmse:529.981079 
[4] eval-rmse:501.730103 
[5] eval-rmse:479.081329

score 6 · Accepted Answer · answered Jun 13 '22 at 16:04

I have a suggestion.
Indeed the methodology is the right one but the problem comes from the definition of your functions. Since they are not the right ones, they then give the wrong Grad and Hess. The metric is also not correct.
You must use :

Objective :

$$f(preds, labels)=\frac{1}{2}(preds-labels)^2$$
$$grad=\ (preds-labels)$$
$$hess=\ 1$$

Metrics :

$$err = \frac{\sum_{i=1}^{n}}{n}(preds-labels)^2$$

My suggestions :


# Custom objective function (squared error)
myobjective <- function(preds, dtrain) {
  labels <- getinfo(dtrain, "label")
  grad <- (preds-labels)    
  hess <- rep(1, length(labels))                
  return(list(grad = grad, hess = hess))
}
Custom Metric
evalerror <- function(preds, dtrain) {
  labels <- getinfo(dtrain, "label")
  err <- (preds-labels)^2

  return(list(metric = "MyError", value = mean(err)))

}

I get the same results using these functions.
More code on loss/gradient customization are available on my github https://www.github.com/kipedene/Custom_objectif.

XGBoost custom objective for regression in R

Data

XGBoost Matrix

Custom objective function (squared error)

Custom Metric

Model Parameter

Train Model

Predict

XGB Model with standard loss/metric