Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perf object detection #1098

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

gogetron
Copy link
Contributor

Summary

This PR partially addresses #862

🎯 Purpose: Improve performance of find_label_issues in object_detection/filter.py file.

[ ✏️ Write your summary here. ]
A significant improvement comes from vectorizing _get_iou, _euc_dis and _calculate_average_precision. I searched for all references to the deleted _mod_coordinates and _get_overlap but they are not used anywhere else in the code. I assumed it is ok to remove them because they are not part of the public api however let me know if you prefer to deprecate them or keep both versions.

For memory I used the memory-profiler library. The code I used for benchmarking is copied below. In addition I sorted the imports in the modified files.

Code Setup

import random

import numpy as np

from cleanlab.object_detection.filter import find_label_issues

np.random.seed(0)
random.seed(0)

N = 5_000
SIZE = 256
MAX_L = 10
MAX_M = 100
K = 20

# Create input data
labels = []
predictions = []
for _ in range(N):
    x = random.randint(1, MAX_L)
    bboxes = np.empty((x, 4))
    bboxes[:, :2] = np.random.randint(0, SIZE // 2, size=(x, 2))
    bboxes[:, 2:] = np.random.randint(SIZE // 2, SIZE, size=(x, 2))
    img_labels = np.random.randint(K, size=x)
    label = {"bboxes": bboxes, "labels": img_labels}
    labels.append(label)

    y = random.randint(x, MAX_M)
    prediction = np.empty((K, y, 5))
    prediction[:, :, :2] = np.random.randint(0, SIZE // 2, size=(K, y, 2))
    prediction[:, :, 2:4] = np.random.randint(SIZE // 2, SIZE, size=(K, y, 2))
    prediction[:, :, 4] = np.random.random((K, y))
    predictions.append(prediction)

Current version

%%timeit
%memit find_label_issues(labels, predictions)
# peak memory: 1317.51 MiB, increment: 820.18 MiB
# peak memory: 1317.58 MiB, increment: 0.07 MiB
# peak memory: 1317.60 MiB, increment: 0.02 MiB
# peak memory: 1317.86 MiB, increment: 0.28 MiB
# peak memory: 1319.04 MiB, increment: 1.18 MiB
# peak memory: 1319.09 MiB, increment: 0.07 MiB
# peak memory: 1319.09 MiB, increment: 0.02 MiB
# peak memory: 1319.11 MiB, increment: 0.02 MiB
# 8min 7s ± 30.2 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

This PR

%%timeit
%memit find_label_issues(labels, predictions)
# peak memory: 1291.41 MiB, increment: 792.09 MiB
# peak memory: 1291.82 MiB, increment: 1.19 MiB
# peak memory: 1291.87 MiB, increment: 0.05 MiB
# peak memory: 1292.05 MiB, increment: 0.18 MiB
# peak memory: 1292.07 MiB, increment: 0.00 MiB
# peak memory: 1292.07 MiB, increment: 0.00 MiB
# peak memory: 1292.07 MiB, increment: 0.00 MiB
# peak memory: 1292.07 MiB, increment: 0.00 MiB
# 29.2 s ± 574 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Testing

🔍 Testing Done: Existing test suite and output verification (explained below).

References

Reviewer Notes

💡 Include any specific points for the reviewer to consider during their review.

I was not sure about which input would be more realistic, I have tried different options but the improvements were similar, however let me know if you want me to test a specific combination of inputs, I tried to use the largest input dataset I could while still being able to run the master %%timeit in about an hour.

I have noticed that the num_procs argument in _calculate_ap_per_class is always set to the default value. Thus, i made a few changes there to ensure that the pool is only created when num_procs is higher than 1.

After each change I ran the tests and they were catching any minor mistake I made. However once they were passing I ensured with some random input that both versions (master and this PR) still produced the same exact output.

Copy link
Member

@elisno elisno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @gogetron!
I have a few suggestions for rank.py.

cleanlab/object_detection/rank.py Outdated Show resolved Hide resolved
cleanlab/object_detection/rank.py Show resolved Hide resolved
cleanlab/object_detection/rank.py Outdated Show resolved Hide resolved
cleanlab/object_detection/rank.py Outdated Show resolved Hide resolved
cleanlab/object_detection/rank.py Outdated Show resolved Hide resolved
cleanlab/object_detection/rank.py Outdated Show resolved Hide resolved
cleanlab/object_detection/rank.py Outdated Show resolved Hide resolved
cleanlab/object_detection/rank.py Outdated Show resolved Hide resolved
@gogetron
Copy link
Contributor Author

Thank you for your review, your suggestions are very clear and useful. I have just committed your suggestions.

@gogetron gogetron requested a review from elisno April 27, 2024 08:02
@jwmueller jwmueller requested a review from aditya1503 May 14, 2024 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants