[RFC] Stepwise Agent Evaluator #223

hinthornw · 2024-03-20T23:45:28Z

Teacher forcing example that computes the (macro) average step-wise success rate of an agent.

Code explicitly looks at message graph but i could look into cleaning up / generalizing.

a bit too much code tbh imo but i like the idea. Maybe would be better if the trajectory were more high level rather than full mocks. Open to lots of suggestions (e.g., simplifying the stepwise score )

https://smith.langchain.com/public/3b46ebd4-8cf9-41b8-8fbc-5df92528a148/d

Stepwise Agent Evaluator

e805a22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Stepwise Agent Evaluator #223

[RFC] Stepwise Agent Evaluator #223

hinthornw commented Mar 20, 2024 •

edited

[RFC] Stepwise Agent Evaluator #223

Are you sure you want to change the base?

[RFC] Stepwise Agent Evaluator #223

Conversation

hinthornw commented Mar 20, 2024 • edited

hinthornw commented Mar 20, 2024 •

edited