Wrong computations

Hi Andrej, 

thanks a lot for writing different ML demos. 

When I was building my demo on gradient boosting, I initially thought to take your implementation of trees, but first I reviewed the code...
1.  https://github.com/karpathy/forestjs/blob/master/lib/randomforest.js#L327
   entropy function computation is wrong, since `p` was overwritten.
   
   Fortunately, this isn't important at all - in fact, this summand is always omitted during computations, since it's just a global constant.
2. https://github.com/karpathy/forestjs/blob/master/lib/randomforest.js#L299
   computed information gain is wrong. For some reason you compute only impurity while ignoring the number of samples in the leaf (or at least proportions). 

So, the correct formula for leaf penalty with entropy is (it's actually log-likelihood, nothing else):
n log n  - n_{+} log n_{+} - n_{-} log n_{-}
to get improvement, subtract from parent's penalty sum of children penalties. You can check that information gain written this way satisfies different basic properties. 

Hardly any other algorithm could bear such 'pecularities' of implementation, but random forest works smoothly even in such situation :) Cool, right? 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong computations #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Wrong computations #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions