⚡️ Speed up _github_search_discussions()
by 22% in embedchain/loaders/github.py
#1263
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
📄
_github_search_discussions()
inembedchain/loaders/github.py
📈 Performance went up by
22%
(0.22x
faster)⏱️ Runtime went down from
3721.43μs
to3060.92μs
Explanation and details
(click to show)
In the provided code, to improve performance we can combine all the replacement operations in the
clean_string()
function into a singlere.sub()
operation. To do this, we can create a character class in a regex pattern which matches all the characters which wanted to be replaced. Then in theGithubLoader
class, to improve performance we can avoid making useless requests for discussions that won't be used when the body of discussion is empty. Here is the optimized code:In the
clean_string()
function, endregion is applied to replace backslashes, hash symbols and newLines and eliminate consecutive non-alphanumeric characters in one regex step for improved performance. The parameter comments_created_at is removed from the metadata dictionary in _github_search_discussions method because it was not actually being populated anywhere and thus improving the space efficiency of code. Also moved the string concatenation to only occur when a body exists to avoid making unnecessary calls toclean_string()
.Type of change
Please delete options that are not relevant.
How Has This Been Tested?
✅ 2 Passed − 🌀 Generated Regression Tests
(click to show generated tests)
Checklist:
Maintainer Checklist