7

I want to extract each tree so that I can feed it with any data, and see the output.

dump_list=xg_clas.get_booster().get_dump()
num_t=len(dump_list)
print("Number of Trees=",num_t)

I can find number of trees like this,

xgb.plot_tree(xg_clas, num_trees=0)
plt.rcParams['figure.figsize']=[50, 10]
plt.show()

graph each tree like this. When I do something like:

dump_list[0]

it gives me the tree as a text. But I couldn't find any way to extract a tree as an object, and use it.

https://github.com/dmlc/xgboost/issues/117#ref-commit-3f6ff43 I found this but didn't really understand what is suggested.

Progress: I tried to somehow turn

dump_list[0]

string object into a sklearn DecisionTreeClassifier object. Still no luck.

I uploaded my notebook if you want to check it out: https://github.com/sciencelove11/Question

J.Smith
  • 468
  • 4
  • 16

1 Answers1

7

This is an open feature request (at time of writing):
https://github.com/dmlc/xgboost/issues/2175
https://github.com/dmlc/xgboost/issues/3439
There, a very wasteful but working solution is mentioned: predict using ntree_limit for each number of trees of interest. I've put together a quick demonstration Colab notebook here.

It also has been asked several times over at SO, see e.g.
https://stackoverflow.com/questions/51681714/extract-trees-and-weights-from-trained-xgboost-model
https://stackoverflow.com/questions/37677496/how-to-get-access-of-individual-trees-of-a-xgboost-model-in-python-r
and their Related questions.
In the first link, another workaround is mentioned: by dumping to text/PMML, you should be able to reload each individual tree (or subsets thereof) and make the predictions. It's not clear how to make this work though: XGB itself doesn't have an easy way to load a model except from its own binary format. You might be able to do it by parsing the output (JSON seems most promising) into another library with tree models.

Ben Reiniger
  • 12,855
  • 3
  • 20
  • 63