You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We want to add proper backtesting support, deprecating the beta implementation of the compute_test_metrics, etc.
Something like
def backtest_evaluate(
target: ls_eval.TARGET_T,
/,
prod_runs: Sequence[ls_schemas.Run],
*,
evaluators: Optional[Sequence[ls_eval.EVALUATOR_T]] = None,
summary_evaluators: Optional[Sequence[ls_eval.SUMMARY_EVALUATOR_T]] = None,
metadata: Optional[dict] = None,
experiment_prefix: Optional[str] = None,
max_concurrency: Optional[int] = None,
client: Optional[Client] = None,
blocking: bool = True,
) -> ls_eval.ExperimentResults:
"""Backtest a target system or function against a sample of production traces.
Args:
target (ls_eval.TARGET_T): The target system or function to evaluate.
prod_runs (Sequence[ls_schemas.Run]): A sequence of production runs to use for backtesting.
evaluators (Optional[Sequence[ls_eval.EVALUATOR_T]]): A list of evaluators to run
on each example. Defaults to None.
summary_evaluators (Optional[Sequence[ls_eval.SUMMARY_EVALUATOR_T]]): A list of summary
evaluators to run on the entire dataset. Defaults to None.
metadata (Optional[dict]): Metadata to attach to the experiment.
Defaults to None.
experiment_prefix (Optional[str]): A prefix to provide for your experiment name.
Defaults to None.
max_concurrency (Optional[int]): The maximum number of concurrent
evaluations to run. Defaults to None.
client (Optional[Client]): The LangSmith client to use.
Defaults to None.
blocking (bool): Whether to block until the evaluation is complete.
Defaults to True.
Returns:
ls_eval.ExperimentResults: The results of the backtesting evaluation.
"""
if not prod_runs:
raise ValueError(f"""Expected a non-empty sequence of production runs. Received: {prod_runs}""")
client = client or Client()
test_dataset_name = f"backtest-{uuid.uuid4().hex[:6]}"
test_project = convert_runs_to_test(
prod_runs,
dataset_name=test_dataset_name,
client=client,
)
return ls_eval.evaluate(
target,
data=test_dataset_name,
evaluators=evaluators,
summary_evaluators=summary_evaluators,
metadata=metadata,
experiment_prefix=experiment_prefix,
max_concurrency=max_concurrency,
client=client,
blocking=blocking,
)
Motivation
Backtesting is important - we to have strong APIs for this.
The text was updated successfully, but these errors were encountered:
Feature request
We want to add proper backtesting support, deprecating the beta implementation of the
compute_test_metrics
, etc.Something like
Motivation
Backtesting is important - we to have strong APIs for this.
The text was updated successfully, but these errors were encountered: