lab.find_issues(features=features) outputs error for underperforming issue #1065

sanjanag · 2024-03-26T19:16:45Z

lab.find_issues(features=features) output

[/Users/sanjana/cleanlab_home/fork_cleanlab/cleanlab/datalab/internal/issue_finder.py:457](https://file+.vscode-resource.vscode-cdn.net/Users/sanjana/cleanlab_home/fork_cleanlab/cleanlab/datalab/internal/issue_finder.py:457): UserWarning: No labels were provided. The 'label' issue type will not be run.
  warnings.warn("No labels were provided. " "The 'label' issue type will not be run.")
Finding null issues ...
Finding outlier issues ...
Fitting OOD estimator based on provided features ...
Finding near_duplicate issues ...
Finding non_iid issues ...
Finding underperforming_group issues ...
Error in underperforming_group: UnderperformingGroupIssueManager.find_issues() missing 1 required positional argument: 'pred_probs'
Failed to check for these issue types: [UnderperformingGroupIssueManager]

Audit complete. 984 issues found in the dataset.

Dataset: https://www.kaggle.com/datasets/laotse/credit-risk-dataset/data

Code

import pandas as pd
from cleanlab import Datalab
from sklearn.preprocessing import StandardScaler
import numpy as np

df = pd.read_csv("./credit_risk_dataset.csv")
df = df[~df.isnull().any(axis=1)].copy()
feature_columns = df.columns.to_list()
feature_columns.remove("loan_status")

X_raw = df[feature_columns]
labels = df["loan_status"]

cat_features = [
    "person_home_ownership",
    "loan_intent",
    "loan_grade",
    "cb_person_default_on_file",
]
numeric_features = [
    "person_age",
    "person_income",
    "person_emp_length",
    "loan_amnt",
    "loan_int_rate",
    "loan_percent_income",
    "cb_person_cred_hist_length",
]

X_encoded = pd.get_dummies(X_raw, columns=cat_features, drop_first=True, dtype='float')

scaler = StandardScaler()
X_processed = X_encoded.copy()
X_processed[numeric_features] = scaler.fit_transform(X_encoded[numeric_features])

lab = Datalab({"X": X_processed.to_numpy(), "y": labels})

lab.find_issues(features=X_processed.to_numpy())

The text was updated successfully, but these errors were encountered:

jwmueller · 2024-04-11T06:45:43Z

@elisno seems like the mapping that decides what issue-types to run based on the supplied args is off. The Underperforming group check should only run if pred_probs were included in the supplied args.

sanjanag added the needs triage label Mar 26, 2024

jwmueller added bug Something isn't working and removed needs triage labels Apr 11, 2024

jwmueller assigned elisno Apr 11, 2024

jwmueller unassigned elisno Apr 11, 2024

jwmueller added the help-wanted We need your help to add this, but it may be more challenging than a "good first issue" label Apr 11, 2024

gogetron mentioned this issue Apr 14, 2024

Refine handling of underperforming_group issue type #1099

Merged

elisno closed this as completed in #1099 May 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lab.find_issues(features=features) outputs error for underperforming issue #1065

lab.find_issues(features=features) outputs error for underperforming issue #1065

sanjanag commented Mar 26, 2024

jwmueller commented Apr 11, 2024

lab.find_issues(features=features) outputs error for underperforming issue #1065

lab.find_issues(features=features) outputs error for underperforming issue #1065

Comments

sanjanag commented Mar 26, 2024

jwmueller commented Apr 11, 2024