Track costs for streaming with OpenAI #214

brenkao · 2024-05-07T16:32:33Z

Is your feature request related to a problem? Please describe.
Prior versions of openai did not have usage stats when streaming.

Describe the solution you'd like
Add stream_options: {"include_usage": true}. Add total_cost as a property of OpenAICallResponseChunk.

Additional context
OpenAI Cookbook Reference

The text was updated successfully, but these errors were encountered:

willbakst · 2024-05-07T16:50:48Z

We may also want to consider updating the generator to just return total cost separate from the response chunk so the generator would have type Generator[BaseCallResponseChunkT, None, Optional[float]] and then return total cost at the end of the generator if available, otherwise return None.

brenkao · 2024-05-14T03:04:33Z

The stream_options: {"include_usage": true} has been implemented with #239

brenkao · 2024-05-22T22:58:08Z

I imagine the other providers will follow how OpenAI and Cohere include usage when streaming. So rather than have return type be the cost, we should add properties, cost, usage, input_tokens, and output_tokens like we do for BaseCallResponse.

It would be as follows:

class BaseCallResponseChunk(BaseModel, Generic[ChunkT, BaseToolT], ABC):
    """A base abstract interface for LLM streaming response chunks.

    Attributes:
        response: The original response chunk from whichever model response this wraps.
    """

    chunk: ChunkT
    tool_types: Optional[list[Type[BaseToolT]]] = None
    cost: Optional[float] = None  # The cost of the completion in dollars

    model_config = ConfigDict(extra="allow", arbitrary_types_allowed=True)

    ...

class OpenAICallResponseChunk(BaseCallResponseChunk[ChatCompletionChunk, OpenAITool]):
    """Convenience wrapper around chat completion streaming chunks.

    When using Mirascope's convenience wrappers to interact with OpenAI models via
    `OpenAICall.stream`, responses will return an `OpenAICallResponseChunk`, whereby
    the implemented properties allow for simpler syntax and a convenient developer
    experience.

    Example:

    ```python
    from mirascope.openai import OpenAICall


    class Math(OpenAICall):
        prompt_template = "What is 1 + 2?"


    for chunk in OpenAICall().stream():
        print(chunk.content)

    #> 1
    #  +
    #  2
    #   equals
    #
    #  3
    #  .
    """

    response_format: Optional[ResponseFormat] = None

    @property
    def choices(self) -> list[ChunkChoice]:
        """Returns the array of chat completion choices."""
        return self.chunk.choices

    @property
    def choice(self) -> ChunkChoice:
        """Returns the 0th choice."""
        return self.chunk.choices[0]

    @property
    def delta(self) -> Optional[ChoiceDelta]:
        """Returns the delta for the 0th choice."""
        if self.chunk.choices:
            return self.chunk.choices[0].delta
        return None

    @property
    def content(self) -> str:
        """Returns the content for the 0th choice delta."""
        return (
            self.delta.content if self.delta is not None and self.delta.content else ""
        )

    @property
    def tool_calls(self) -> Optional[list[ChoiceDeltaToolCall]]:
        """Returns the partial tool calls for the 0th choice message.

        The first `list[ChoiceDeltaToolCall]` will contain the name of the tool and
        index, and subsequent `list[ChoiceDeltaToolCall]`s will contain the arguments
        which will be strings that need to be concatenated with future
        `list[ChoiceDeltaToolCall]`s to form a complete JSON tool calls. The last
        `list[ChoiceDeltaToolCall]` will be None indicating end of stream.
        """
        if self.delta:
            return self.delta.tool_calls
        return None

    @property
    def usage(self) -> Optional[CompletionUsage]:
        """Returns the usage of the chat completion."""
        if self.response.usage:
            return self.response.usage
        return None

    @property
    def input_tokens(self) -> Optional[int]:
        """Returns the number of input tokens."""
        if self.usage:
            return self.usage.prompt_tokens
        return None

    @property
    def output_tokens(self) -> Optional[int]:
        """Returns the number of output tokens."""
        if self.usage:
            return self.usage.completion_tokens
        return None

What will also need to be updated are our stream and stream_async functions. We can check if usage exists, and call openai_api_calculate_cost if we detect it.

Finally, the user when iterating through the stream can check if cost exists.

from mirascope.openai import OpenAICall

class BookRecommender(OpenAICall):
    prompt_template = "Please recommend a {genre} book."

    genre: str


stream = BookRecommender(genre="fantasy").stream()
for chunk in stream:
    print(chunk.content, end="")
    if chunk.cost is not None:
        print(chunk.cost)

willbakst · 2024-05-22T23:45:41Z

I wonder if we could take advantage of the generator return value to push the cost check inside of the generator if desired.

For instance:

stream = BookRecommender(genre="fantasy").stream()
for chunk in stream:
    print(chunk.content, end="", flush=True)
cost = stream.value  # Optional[float]

Internally we would check for the chunk cost (i.e. do everything the same as above) but return it so the user doensn't have to manually check if cost is not None

tvj15 · 2024-05-25T20:10:12Z

Hi, @willbakst, I am working on this issue and I was able to add the feature for OpenAI. Now while working for Cohere API, the usage() property for CohereCallResponse returns a type Optional[ApiMetaBilledUnits].
But now for the usage() property on CohereCallResponseChunk; according to Cohere's Stream API it does not give the response with the same type (It gives the token_count property). Any ideas on how should I tackle that? I thought of creating a variable of the type Optional[ApiMetaBilledUnits] based on the data available from token_count.

Example:

 "token_count": {
            "prompt_tokens": 2821,
            "response_tokens": 29,
            "total_tokens": 2850,
            "billed_tokens": 37
        }

can be converted to:

 "billed_units": {
            "input_tokens": 8,
            "output_tokens": 29
        }

willbakst · 2024-05-25T20:15:57Z

Please submit the PR for OpenAI first without the cohere stuff so we can review in smaller chunks.

Please also move the discussion on Cohere to its issue so we can continue tracking even if we close this issue. I’ll need to take a deeper look into the cohere API to give the best answer. My quick answer would be that massaging the data into the desired format could work, but if you think there’s a better option we can always review in the PR where we can better see how it works all together.

Thanks!

tvj15 · 2024-05-25T20:36:35Z

I was going to raise the PR for OpenAI, but the issue with that was that as I made changes to the BaseCallResponseChunk abstract class it required changes to be made in all the classes implementing this class and was failing some test cases. So should I raise the PR anyways?

willbakst · 2024-05-25T20:38:38Z

I would only add the abstract methods if we’re going to require these methods on all response chunk types, but given that not all of them currently support streaming cost tracking we should just make the methods specific (for now) to the providers that support it.

willbakst · 2024-06-04T17:06:57Z

This is released in v0.16 🎉

brenkao added Feature Request New feature or request good first issue Good for newcomers labels May 7, 2024

willbakst mentioned this issue May 8, 2024

Track costs for streaming with Cohere #218

Open

willbakst changed the title ~~Cost tracking for streaming OpenAI~~ Feature Request: Track costs for streaming with OpenAI May 8, 2024

willbakst changed the title ~~Feature Request: Track costs for streaming with OpenAI~~ [FEATURE REQUEST] Track costs for streaming with OpenAI May 8, 2024

willbakst changed the title ~~[FEATURE REQUEST] Track costs for streaming with OpenAI~~ Track costs for streaming with OpenAI May 10, 2024

This comment was marked as duplicate.

Sign in to view

tvj15 mentioned this issue May 25, 2024

feat: Track costs for streaming with OpenAI #282

Merged

willbakst added this to the v0.16 milestone Jun 1, 2024

willbakst assigned tvj15 Jun 1, 2024

willbakst removed this from the v0.17 milestone Jun 4, 2024

willbakst closed this as completed Jun 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track costs for streaming with OpenAI #214

Track costs for streaming with OpenAI #214

brenkao commented May 7, 2024

willbakst commented May 7, 2024

brenkao commented May 14, 2024

brenkao commented May 22, 2024

willbakst commented May 22, 2024

This comment was marked as duplicate.

tvj15 commented May 25, 2024

willbakst commented May 25, 2024

tvj15 commented May 25, 2024

willbakst commented May 25, 2024

willbakst commented Jun 4, 2024

Track costs for streaming with OpenAI #214

Track costs for streaming with OpenAI #214

Comments

brenkao commented May 7, 2024

willbakst commented May 7, 2024

brenkao commented May 14, 2024

brenkao commented May 22, 2024

willbakst commented May 22, 2024

This comment was marked as duplicate.

tvj15 commented May 25, 2024

willbakst commented May 25, 2024

tvj15 commented May 25, 2024

willbakst commented May 25, 2024

willbakst commented Jun 4, 2024