3

Based on answers to this question, I should be able to build a random forest with all the same trees by using bootstrap = False, max_features = None, random_state = 42 parameters.

I wrote quick code to test it, and it seems that different trees are created.

Is it possible to create a random forest using RandomForestClassifier which will produce the same trees?

Ethan
  • 1,657
  • 9
  • 25
  • 39
Damian Melniczuk
  • 649
  • 4
  • 19

1 Answers1

4

You want to create a random forest where all decision trees are the same. (I am trusting you are trying to do this as an educational exercise, not as something serious. :))

You tried to do this by setting random_state and cited my answer from

Indeed, as you discovered, my answer is not fully correct.

I have edited it to add

While this removes the stochasticity component of the training, the decision trees would still be different. The thing is that sklearn ensembles generate a new random seed for each child based on the random state they are given. They do not pass along the same random_state.

You can see this is the case by checking the _set_random_states method from the ensemble base module, in particular this line, which propagates the random_state across the ensembles' children.

As you can see, sklearn uses the random_state you give RandomForest to create new random states for each child,

for key in sorted(estimator.get_params(deep=True)):
    if key == 'random_state' or key.endswith('__random_state'):
        to_set[key] = random_state.randint(MAX_RAND_SEED)

Therefore they will be different. This is done presumably because heterogeneity is what makes an ensemble powerful. Notice this is not specific to RandomForest, but to any sklearn ensemble.

Sorry for my incorrect answer to the question you cite. If you want, you can use the random forest implementation. That one is completely homogeneous.

Ricardo Cruz
  • 3,440
  • 1
  • 16
  • 34