3

I wondered if there were R packages that allow the user to customise the loss function?

For example, if a random forest package like ranger had a loss function which minimises OOB MSE. Is it somehow possible to customise this into a model which minimises negative log likelihood?

Would appreciate if someone knew of examples/code of doing this on a toy dataset

EB3112
  • 203
  • 1
  • 6

2 Answers2

1

Using XGBoost it is relatively easy to invoke a custom loss function. There are also quite a lot of already implemented options.

It would look something like:

library(ISLR)
library(xgboost)
library(tidyverse)
library(Metrics)

Data

df = ISLR::Hitters %>% select(Salary,AtBat,Hits,HmRun,Runs,RBI,Walks,Years,CAtBat,CHits,CHmRun,CRuns,CRBI,CWalks,PutOuts,Assists,Errors) df = df[complete.cases(df),] train = df[1:150,] test = df[151:nrow(df),]

XGBoost Matrix

dtrain <- xgb.DMatrix(data=as.matrix(train[,-1]),label=as.matrix(train[,1])) dtest <- xgb.DMatrix(data=as.matrix(test[,-1]),label=as.matrix(test[,1])) watchlist <- list(eval = dtest)

Custom objective function (Huber)

Reference for gradients: https://stackoverflow.com/questions/45006341/xgboost-how-to-use-mae-as-objective-function

myobjective <- function(preds, dtrain) { labels <- getinfo(dtrain, "label") d = preds - labels h = 5 scale = 1 + (d / h)^2 scale_sqrt = sqrt(scale) grad = d / scale_sqrt hess = 1 / scale / scale_sqrt return(list(grad = grad, hess = hess)) }

Custom Metric

evalerror <- function(preds, dtrain) { labels <- getinfo(dtrain, "label") u = (preds-labels)^2 err <- (sum(u) / length(u))^(1/2) return(list(metric = "MyError", value = err)) }

Model Parameter

param <- list(booster = 'gbtree' , learning_rate = 0.1 , objective = myobjective , eval_metric = evalerror , set.seed = 2020)

Train Model

xgb <- xgb.train(params = param , data = dtrain , nrounds = 500 , watchlist , maximize = FALSE , early_stopping_rounds = 5 ,verbose=1)

Predict

pred = predict(xgb, dtest) mae = mae(test$Salary, pred) print(mae)

Peter
  • 7,896
  • 5
  • 23
  • 50
0

I have done this before. Not with ranger. Packages that are open source, and with the appropriate license, I have changed the code for my model. The code might be in R, C, Python, Java.

I have also used xgboost for this. xgboost allow you to customize the objective function. You create the gradient and the hessian based on your function. You can call xgboost from R. The example is in Python. I am not sure if they support an R customized loss function but writing one function in Python might not be hard for your team.

I have not done this but you can use Tensorflow in R and write a customized loss function.

Craig
  • 1,034
  • 5
  • 10