Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IForest feature contribution for a given outlier #508

Open
lujiazho opened this issue Jun 19, 2023 · 3 comments
Open

IForest feature contribution for a given outlier #508

lujiazho opened this issue Jun 19, 2023 · 3 comments

Comments

@lujiazho
Copy link

Hi, is it possible to print out which feature directly/more likely contributes to determining a data point as an outlier? So that we can check for each data point that which features may deviate the data point from the distribution the most.

@KulikDM
Copy link
Contributor

KulikDM commented Jun 22, 2023

Hi @lujiazho, you can actually use the shap library for this:

# Fit PyOD model
clf = KNN()
clf.fit(data)
    
# Shap is slow so perhaps only explain highest likelihood values, e.g. 100
scores = clf.decision_scores_
idx = np.argsort(scores)
    
# Fit shap explainer and get values for top 100
explainer = shap.Explainer(clf.decision_function, data)
shap_values = explainer(data[idx][-100:])

# Or for just one point of interest
idx = 55
shap_values = explainer(data[55].reshape(1,-1))
    
# Example of some plots
shap.plots.waterfall(shap_values)
shap.plots.beeswarm(shap_values)
shap.plots.bar(shap_values)

Hope this is what you were looking for and all the best!

@lujiazho
Copy link
Author

Thanks a lot! That's exactly what I want!

@jesuinovieira
Copy link

You can also use clf.feature_importances_() for IForest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants