Skip to content

Add KhiopsInterpreter for Using the Intepretation Feature in the Sklearn API #459

@popescu-v

Description

@popescu-v

Description

Following #458, intepretation models would be buildable from fitter KhiopsClassifier estimator instances, but they would still not be usable solely through the Sklearn API.

The goal of this issue is to make the interpretation usable, by taking loose inspiration from the SHAP library (https://shap.readthedocs.io)'s Explainer class:

  • build a KhiopsIntepreter class, which would take as parameters:
    • khc: a fitted KhiopsClassifier instance;
    • the Khiops interpretation-specific parameters n_variable_importances and importance_ranking;
  • upon initialization of an instance, KhiopsIntepreter would:
    • check that khc is an instance of KhiopsClassifier and it is fitted (via sklearn.utils.validation.check_is_fitted);
    • create target Khiops interpretation file path
    • call into api.interpret_predictor by using khc.model_ and khc.model_main_dictionary_name_ as the first 2 positional parameters, as well as the Khiops interpretation file path.
    • create, via api.read_khiops_dictionary_file, a DictionaryDomain from the interpretation model built at the previous step;
    • store the interpretation model DictionaryDomain as a private attribute of the KhiopsIntepreter;
  • pass a test dataset to the instantiated KhiopsInterpreter class:
    khc = KhiopsClassifier(...)
    khc.fit(X_train)
    interpreter = KhiopsInterpreter(khc, n_variable_importances=10, importance_ranking="Local")
    interpretation = interpreter(X_test)
    shapley_values = interpretation.values
  • the .values is computed by doing a series of processing steps that are similar to KhiopsClassifier.predict steps, roughly:
    • convert X_test to one or several (for muti-table datasets) CSV files;
    • call api.deploy_model using the interpretation DictionaryDomain computed upon the KhiopsInterpreter instance creation;
    • convert the resulting CSV files with the Shapley values to a Pandas DataFrame (or a NuPy array if X_test is a NumPy array).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions