Negative Entropy Values and More

Hello. Thanks for this package, but I am running into a lot of troubles with it.

First of all in mi.py you use entropy implementation by  Gael Varoquaux which gives negative MI's. I replaced that with [sklearn's MI](https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.mutual_info_regression.html), and got rid of that problem, but still the features end up being chosen don't make sense.

I used iris dataset from sklearn. I replicate a feature, but as you can see the method here ends up picking up the same feature twice which shouldn't be the case. Here is the MWE:
```
import pandas as pd
import mifs
import pandas as pd

from sklearn import datasets
iris = datasets.load_iris()
iris_df = pd.DataFrame(iris.data, columns = iris.feature_names)
import numpy as np

X = iris_df[['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)']].values
print (X[:5, :])
X = np.hstack((X[:,2].reshape((-1, 1)), X))
print (X[:5, :])
y = iris_df['petal width (cm)'].values.reshape((1, -1)).squeeze()

# define MI_FS feature selection method
feat_selector = mifs.MutualInformationFeatureSelector(categorical=False, n_features=2)

# find all relevant features
feat_selector.fit(X, y)

# check selected features
print (feat_selector._support_mask)

# check ranking of features
print (feat_selector.ranking_)

# call transform() on X to filter it down to selected features
X_filtered = feat_selector.transform(X)
```
you can comment or uncomment the appending line. 

Also there was no attribute called support for feat_selector  and I had to replace that with _support_mask in your example. The code I changed was only the function _get_first_mi
and it is changed to:
```
def _get_first_mi(i, k, MI_FS):
    n, p = MI_FS.X.shape

    if MI_FS.categorical:
        x = MI_FS.X[:, i].reshape((n, 1))
        MI = _mi_dc(x, MI_FS.y, k)
    else:
        vars = (MI_FS.X[:, i].reshape((n, 1)), MI_FS.y)

        MI = _mi_cc(vars, k)
        from sklearn.feature_selection import mutual_info_regression
        MI_2 = mutual_info_regression(vars[0], vars[1],n_neighbors=k)
    MI = MI_2[0]
    # MI must be non-negative
    if MI > 0:
        return MI
    else:
        return np.nan
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Negative Entropy Values and More #29

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Negative Entropy Values and More #29

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions