Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create evaluate_backtest #574

Open
hinthornw opened this issue Apr 4, 2024 · 0 comments
Open

Create evaluate_backtest #574

hinthornw opened this issue Apr 4, 2024 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@hinthornw
Copy link
Collaborator

Feature request

We want to add proper backtesting support, deprecating the beta implementation of the compute_test_metrics, etc.

Something like

def backtest_evaluate(
    target: ls_eval.TARGET_T,
    /,
    prod_runs: Sequence[ls_schemas.Run],
    *,
    evaluators: Optional[Sequence[ls_eval.EVALUATOR_T]] = None,
    summary_evaluators: Optional[Sequence[ls_eval.SUMMARY_EVALUATOR_T]] = None,
    metadata: Optional[dict] = None,
    experiment_prefix: Optional[str] = None,
    max_concurrency: Optional[int] = None,
    client: Optional[Client] = None,
    blocking: bool = True,
) -> ls_eval.ExperimentResults:
    """Backtest a target system or function against a sample of production traces.

    Args:
        target (ls_eval.TARGET_T): The target system or function to evaluate.
        prod_runs (Sequence[ls_schemas.Run]): A sequence of production runs to use for backtesting.
        evaluators (Optional[Sequence[ls_eval.EVALUATOR_T]]): A list of evaluators to run
            on each example. Defaults to None.
        summary_evaluators (Optional[Sequence[ls_eval.SUMMARY_EVALUATOR_T]]): A list of summary
            evaluators to run on the entire dataset. Defaults to None.
        metadata (Optional[dict]): Metadata to attach to the experiment.
            Defaults to None.
        experiment_prefix (Optional[str]): A prefix to provide for your experiment name.
            Defaults to None.
        max_concurrency (Optional[int]): The maximum number of concurrent
            evaluations to run. Defaults to None.
        client (Optional[Client]): The LangSmith client to use.
            Defaults to None.
        blocking (bool): Whether to block until the evaluation is complete.
            Defaults to True.

    Returns:
        ls_eval.ExperimentResults: The results of the backtesting evaluation.
    """
    if not prod_runs:
        raise ValueError(f"""Expected a non-empty sequence of production runs. Received: {prod_runs}""")
    client = client or Client()
    test_dataset_name = f"backtest-{uuid.uuid4().hex[:6]}"
    test_project = convert_runs_to_test(
        prod_runs,
        dataset_name=test_dataset_name,
        client=client,
    )
    return ls_eval.evaluate(
        target,
        data=test_dataset_name,
        evaluators=evaluators,
        summary_evaluators=summary_evaluators,
        metadata=metadata,
        experiment_prefix=experiment_prefix,
        max_concurrency=max_concurrency,
        client=client,
        blocking=blocking,
    )

Motivation

Backtesting is important - we to have strong APIs for this.

@hinthornw hinthornw added the enhancement New feature or request label Apr 10, 2024
@hinthornw hinthornw self-assigned this Apr 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant