Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize internal clip_noise_rates and remove_noise_from_class functions #1105

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

gogetron
Copy link
Contributor

@gogetron gogetron commented Apr 20, 2024

Summary

This PR partially addresses #862

🎯 Purpose: Improve performance of internal clip_noise_rates and remove_noise_from_class functions.

[ ✏️ Write your summary here. ]
The significant improvement comes from using numpy operations where possible.

For memory I used the memory-profiler library. The code I used for benchmarking is copied below. In addition I sorted the imports in the modified files.

Code Setup

import numpy as np

from cleanlab.internal.util import clip_noise_rates, remove_noise_from_class

np.random.seed(0)

K = 5_000
x = np.random.random((K, K))

Current version

%%timeit
%memit clip_noise_rates(x)
# peak memory: 1779.00 MiB, increment: 1308.72 MiB
# peak memory: 1779.38 MiB, increment: 1335.93 MiB
# peak memory: 1755.99 MiB, increment: 1311.49 MiB
# peak memory: 1779.37 MiB, increment: 1333.59 MiB
# peak memory: 1779.60 MiB, increment: 1333.10 MiB
# peak memory: 1779.62 MiB, increment: 1331.66 MiB
# peak memory: 1776.17 MiB, increment: 1327.21 MiB
# peak memory: 1779.68 MiB, increment: 1330.00 MiB
# 4.07 s ± 32.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
%memit remove_noise_from_class(x, 1)
# peak memory: 635.38 MiB, increment: 171.43 MiB
# peak memory: 635.41 MiB, increment: 190.74 MiB
# peak memory: 635.41 MiB, increment: 190.74 MiB
# peak memory: 635.61 MiB, increment: 190.93 MiB
# peak memory: 635.61 MiB, increment: 190.74 MiB
# peak memory: 635.61 MiB, increment: 190.74 MiB
# peak memory: 635.61 MiB, increment: 190.74 MiB
# peak memory: 635.61 MiB, increment: 190.74 MiB
# 354 ms ± 3.67 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

This PR

%%timeit
%memit clip_noise_rates(x)
# peak memory: 817.28 MiB, increment: 332.71 MiB
# peak memory: 454.48 MiB, increment: 0.00 MiB
# peak memory: 454.48 MiB, increment: 0.00 MiB
# peak memory: 454.68 MiB, increment: 0.00 MiB
# peak memory: 793.55 MiB, increment: 338.88 MiB
# peak memory: 454.68 MiB, increment: 0.00 MiB
# peak memory: 454.68 MiB, increment: 0.00 MiB
# peak memory: 454.68 MiB, increment: 0.00 MiB
# 255 ms ± 11.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
%memit remove_noise_from_class(x, 1)
# peak memory: 459.73 MiB, increment: 0.00 MiB
# peak memory: 443.90 MiB, increment: 0.06 MiB
# peak memory: 443.90 MiB, increment: 0.00 MiB
# peak memory: 444.15 MiB, increment: 0.00 MiB
# peak memory: 444.15 MiB, increment: 0.00 MiB
# peak memory: 444.15 MiB, increment: 0.00 MiB
# peak memory: 444.15 MiB, increment: 0.00 MiB
# peak memory: 444.15 MiB, increment: 0.00 MiB
# 204 ms ± 1.33 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Testing

🔍 Testing Done: Existing test suite and I also verified that the outputs were the same after refactoring.

References

Reviewer Notes

💡 Include any specific points for the reviewer to consider during their review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant