Hello. Thanks for this package, but I am running into a lot of troubles with it.
First of all in mi.py you use entropy implementation by Gael Varoquaux which gives negative MI's. I replaced that with sklearn's MI, and got rid of that problem, but still the features end up being chosen don't make sense.
I used iris dataset from sklearn. I replicate a feature, but as you can see the method here ends up picking up the same feature twice which shouldn't be the case. Here is the MWE:
import pandas as pd
import mifs
import pandas as pd
from sklearn import datasets
iris = datasets.load_iris()
iris_df = pd.DataFrame(iris.data, columns = iris.feature_names)
import numpy as np
X = iris_df[['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)']].values
print (X[:5, :])
X = np.hstack((X[:,2].reshape((-1, 1)), X))
print (X[:5, :])
y = iris_df['petal width (cm)'].values.reshape((1, -1)).squeeze()
# define MI_FS feature selection method
feat_selector = mifs.MutualInformationFeatureSelector(categorical=False, n_features=2)
# find all relevant features
feat_selector.fit(X, y)
# check selected features
print (feat_selector._support_mask)
# check ranking of features
print (feat_selector.ranking_)
# call transform() on X to filter it down to selected features
X_filtered = feat_selector.transform(X)
you can comment or uncomment the appending line.
Also there was no attribute called support for feat_selector and I had to replace that with _support_mask in your example. The code I changed was only the function _get_first_mi
and it is changed to:
def _get_first_mi(i, k, MI_FS):
n, p = MI_FS.X.shape
if MI_FS.categorical:
x = MI_FS.X[:, i].reshape((n, 1))
MI = _mi_dc(x, MI_FS.y, k)
else:
vars = (MI_FS.X[:, i].reshape((n, 1)), MI_FS.y)
MI = _mi_cc(vars, k)
from sklearn.feature_selection import mutual_info_regression
MI_2 = mutual_info_regression(vars[0], vars[1],n_neighbors=k)
MI = MI_2[0]
# MI must be non-negative
if MI > 0:
return MI
else:
return np.nan
Hello. Thanks for this package, but I am running into a lot of troubles with it.
First of all in mi.py you use entropy implementation by Gael Varoquaux which gives negative MI's. I replaced that with sklearn's MI, and got rid of that problem, but still the features end up being chosen don't make sense.
I used iris dataset from sklearn. I replicate a feature, but as you can see the method here ends up picking up the same feature twice which shouldn't be the case. Here is the MWE:
you can comment or uncomment the appending line.
Also there was no attribute called support for feat_selector and I had to replace that with _support_mask in your example. The code I changed was only the function _get_first_mi
and it is changed to: