-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retrieval Metrics: Updating HitRate and MRR for Evaluation@K documents retrieved. Also adding RR as a separate metric #12997
Merged
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
62af5fd
Updating metrics: MRR renamed to RR, HitRate updated for multi-doc ev…
AgenP f302d83
Merge branch 'main' of https://github.com/AgenP/llama_index_fork
AgenP 8ae408a
Updated MRR and HitRate with requested changes
AgenP f50c07a
Merge branch 'run-llama:main' into main
AgenP a54e2dd
Merge branch 'main' of https://github.com/AgenP/llama_index_fork into…
AgenP eff2806
Iteration w/ class attribute implementation for the calculation optio…
AgenP File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
import pytest | ||
from llama_index.core.evaluation.retrieval.metrics import HitRate, MRR | ||
|
||
|
||
# Test cases for the updated HitRate class using instance attribute | ||
@pytest.mark.parametrize( | ||
("expected_ids", "retrieved_ids", "use_granular", "expected_result"), | ||
[ | ||
(["id1", "id2", "id3"], ["id3", "id1", "id2", "id4"], False, 1.0), | ||
(["id1", "id2", "id3", "id4"], ["id1", "id5", "id2"], True, 2 / 4), | ||
(["id1", "id2"], ["id3", "id4"], False, 0.0), | ||
(["id1", "id2"], ["id2", "id1", "id7"], True, 2 / 2), | ||
], | ||
) | ||
def test_hit_rate(expected_ids, retrieved_ids, use_granular, expected_result): | ||
hr = HitRate() | ||
hr.use_granular_hit_rate = use_granular | ||
result = hr.compute(expected_ids=expected_ids, retrieved_ids=retrieved_ids) | ||
assert result.score == pytest.approx(expected_result) | ||
|
||
|
||
# Test cases for the updated MRR class using instance attribute | ||
@pytest.mark.parametrize( | ||
("expected_ids", "retrieved_ids", "use_granular", "expected_result"), | ||
[ | ||
(["id1", "id2", "id3"], ["id3", "id1", "id2", "id4"], False, 1 / 1), | ||
(["id1", "id2", "id3", "id4"], ["id5", "id1"], False, 1 / 2), | ||
(["id1", "id2"], ["id3", "id4"], False, 0.0), | ||
(["id1", "id2"], ["id2", "id1", "id7"], False, 1 / 1), | ||
( | ||
["id1", "id2", "id3"], | ||
["id3", "id1", "id2", "id4"], | ||
True, | ||
(1 / 1 + 1 / 2 + 1 / 3) / 3, | ||
), | ||
( | ||
["id1", "id2", "id3", "id4"], | ||
["id1", "id2", "id5"], | ||
True, | ||
(1 / 1 + 1 / 2) / 2, | ||
), | ||
(["id1", "id2"], ["id1", "id7", "id15", "id2"], True, (1 / 1 + 1 / 4) / 2), | ||
], | ||
) | ||
def test_mrr(expected_ids, retrieved_ids, use_granular, expected_result): | ||
mrr = MRR() | ||
mrr.use_granular_mrr = use_granular | ||
result = mrr.compute(expected_ids=expected_ids, retrieved_ids=retrieved_ids) | ||
assert result.score == pytest.approx(expected_result) | ||
|
||
|
||
# Test cases for exceptions handling for both HitRate and MRR | ||
@pytest.mark.parametrize( | ||
("expected_ids", "retrieved_ids", "use_granular"), | ||
[ | ||
( | ||
None, | ||
["id3", "id1", "id2", "id4"], | ||
False, | ||
), # None expected_ids should trigger ValueError | ||
( | ||
["id1", "id2", "id3"], | ||
None, | ||
True, | ||
), # None retrieved_ids should trigger ValueError | ||
([], [], False), # Empty IDs should trigger ValueError | ||
], | ||
) | ||
def test_exceptions(expected_ids, retrieved_ids, use_granular): | ||
with pytest.raises(ValueError): | ||
hr = HitRate() | ||
hr.use_granular_hit_rate = use_granular | ||
hr.compute(expected_ids=expected_ids, retrieved_ids=retrieved_ids) | ||
|
||
mrr = MRR() | ||
mrr.use_granular_mrr = use_granular | ||
mrr.compute(expected_ids=expected_ids, retrieved_ids=retrieved_ids) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure if these can be removed. Did you check if removing these args still satisfies the
compute
method signature forBaseRetrievalMetric
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep you are correct. I did some research and it would violate the Liskov substitution principle which is not the one, particularly with an ABC and its abstract methods.
When making the changes, I will ensure the method signatures are kept the same
Thanks for highlighting this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!