8

When I read the scikit-learn user manual about Decision Trees, they mentioned that

CART (Classification and Regression Trees) is very similar to C4.5, but it differs in that it supports numerical target variables (regression) and does not compute rule sets. CART constructs binary trees using the feature and threshold that yield the largest information gain at each node.

I don't understand where we compute rule sets for the C4.5 algorithm(and I dont even know what rule sets mean). Its essentially same as the CART, except that it uses gini index instead of cross entropy.

Can someone please explain what rule sets are and how they are used in C4.5 in detail?

1 Answers1

0

Decision Tree Algorithm

No matter which decision tree algorithm you are running: ID3, C4.5, CART, CHAID or Regression Trees(CART). They all look for the feature offering the highest information gain. Then, they add a decision rule for the found feature and build another decision tree for the sub-data set recursively until they reached a decision.

C4.5 is the evolution of ID3, presented by the same author (Quinlan, 1993). The C4.5 algorithm generates a decision tree for a given dataset by recursively splitting the records.

  • In building a decision tree we can deal with training sets that have records with unknown attribute values by evaluating the gain, or the gain ratio, for an attribute by considering only the records where that attribute is defined.

  • In using a decision tree, we can classify records that have unknown attribute values by estimating the probability of the various possible results.

For Examples check here

Rule Sets

The Decision Tree algorithm, like Naive Bayes, is based on conditional probabilities. Unlike Naive Bayes, decision trees generate rules. A Ruleset Or simply, Decision Rules consists of a number of rules. Each rule contains a predicate and a predicted class value, plus some information collected at training or testing time on the performance of the rule.

It is easy to derive a rule set from a decision tree: write a rule for each path in the decision tree from the root to a leaf. In that rule, the left-hand side is easily built from the label of the nodes and the labels of the arcs.

The resulting rules set can be simplified:

Let LHS be the left-hand side of a rule. Let LHS be obtained from LHS by eliminating some of its conditions. We can certainly replace LHS with LHS' in this rule if the subsets of the training set that satisfy respectively LHS and LHS' are equal.

A rule may be eliminated by using meta conditions such as "if no other rule applies".

Pluviophile
  • 4,203
  • 14
  • 32
  • 56