Skip to content

Negative Entropy Values and More #29

@naji-s

Description

@naji-s

Hello. Thanks for this package, but I am running into a lot of troubles with it.

First of all in mi.py you use entropy implementation by Gael Varoquaux which gives negative MI's. I replaced that with sklearn's MI, and got rid of that problem, but still the features end up being chosen don't make sense.

I used iris dataset from sklearn. I replicate a feature, but as you can see the method here ends up picking up the same feature twice which shouldn't be the case. Here is the MWE:

import pandas as pd
import mifs
import pandas as pd

from sklearn import datasets
iris = datasets.load_iris()
iris_df = pd.DataFrame(iris.data, columns = iris.feature_names)
import numpy as np

X = iris_df[['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)']].values
print (X[:5, :])
X = np.hstack((X[:,2].reshape((-1, 1)), X))
print (X[:5, :])
y = iris_df['petal width (cm)'].values.reshape((1, -1)).squeeze()

# define MI_FS feature selection method
feat_selector = mifs.MutualInformationFeatureSelector(categorical=False, n_features=2)

# find all relevant features
feat_selector.fit(X, y)

# check selected features
print (feat_selector._support_mask)

# check ranking of features
print (feat_selector.ranking_)

# call transform() on X to filter it down to selected features
X_filtered = feat_selector.transform(X)

you can comment or uncomment the appending line.

Also there was no attribute called support for feat_selector and I had to replace that with _support_mask in your example. The code I changed was only the function _get_first_mi
and it is changed to:

def _get_first_mi(i, k, MI_FS):
    n, p = MI_FS.X.shape

    if MI_FS.categorical:
        x = MI_FS.X[:, i].reshape((n, 1))
        MI = _mi_dc(x, MI_FS.y, k)
    else:
        vars = (MI_FS.X[:, i].reshape((n, 1)), MI_FS.y)

        MI = _mi_cc(vars, k)
        from sklearn.feature_selection import mutual_info_regression
        MI_2 = mutual_info_regression(vars[0], vars[1],n_neighbors=k)
    MI = MI_2[0]
    # MI must be non-negative
    if MI > 0:
        return MI
    else:
        return np.nan

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions