IForest feature contribution for a given outlier #508

lujiazho · 2023-06-19T22:10:23Z

Hi, is it possible to print out which feature directly/more likely contributes to determining a data point as an outlier? So that we can check for each data point that which features may deviate the data point from the distribution the most.

KulikDM · 2023-06-22T16:31:35Z

Hi @lujiazho, you can actually use the shap library for this:

# Fit PyOD model
clf = KNN()
clf.fit(data)
    
# Shap is slow so perhaps only explain highest likelihood values, e.g. 100
scores = clf.decision_scores_
idx = np.argsort(scores)
    
# Fit shap explainer and get values for top 100
explainer = shap.Explainer(clf.decision_function, data)
shap_values = explainer(data[idx][-100:])

# Or for just one point of interest
idx = 55
shap_values = explainer(data[55].reshape(1,-1))
    
# Example of some plots
shap.plots.waterfall(shap_values)
shap.plots.beeswarm(shap_values)
shap.plots.bar(shap_values)

Hope this is what you were looking for and all the best!

lujiazho · 2023-06-23T05:33:27Z

Thanks a lot! That's exactly what I want!

jesuinovieira · 2023-08-22T13:33:52Z

You can also use clf.feature_importances_() for IForest

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IForest feature contribution for a given outlier #508

IForest feature contribution for a given outlier #508

lujiazho commented Jun 19, 2023

KulikDM commented Jun 22, 2023 •

edited

lujiazho commented Jun 23, 2023

jesuinovieira commented Aug 22, 2023

IForest feature contribution for a given outlier #508

IForest feature contribution for a given outlier #508

Comments

lujiazho commented Jun 19, 2023

KulikDM commented Jun 22, 2023 • edited

lujiazho commented Jun 23, 2023

jesuinovieira commented Aug 22, 2023

KulikDM commented Jun 22, 2023 •

edited