Skip to content

Wrong computations #1

@arogozhnikov

Description

@arogozhnikov

Hi Andrej,

thanks a lot for writing different ML demos.

When I was building my demo on gradient boosting, I initially thought to take your implementation of trees, but first I reviewed the code...

  1. https://github.com/karpathy/forestjs/blob/master/lib/randomforest.js#L327
    entropy function computation is wrong, since p was overwritten.

Fortunately, this isn't important at all - in fact, this summand is always omitted during computations, since it's just a global constant.
2. https://github.com/karpathy/forestjs/blob/master/lib/randomforest.js#L299
computed information gain is wrong. For some reason you compute only impurity while ignoring the number of samples in the leaf (or at least proportions).

So, the correct formula for leaf penalty with entropy is (it's actually log-likelihood, nothing else):
n log n - n_{+} log n_{+} - n_{-} log n_{-}
to get improvement, subtract from parent's penalty sum of children penalties. You can check that information gain written this way satisfies different basic properties.

Hardly any other algorithm could bear such 'pecularities' of implementation, but random forest works smoothly even in such situation :) Cool, right?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions