-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track costs for streaming with OpenAI #214
Comments
We may also want to consider updating the generator to just return total cost separate from the response chunk so the generator would have type |
The |
I imagine the other providers will follow how OpenAI and Cohere include usage when streaming. So rather than have return type be the cost, we should add properties, It would be as follows: class BaseCallResponseChunk(BaseModel, Generic[ChunkT, BaseToolT], ABC):
"""A base abstract interface for LLM streaming response chunks.
Attributes:
response: The original response chunk from whichever model response this wraps.
"""
chunk: ChunkT
tool_types: Optional[list[Type[BaseToolT]]] = None
cost: Optional[float] = None # The cost of the completion in dollars
model_config = ConfigDict(extra="allow", arbitrary_types_allowed=True)
...
class OpenAICallResponseChunk(BaseCallResponseChunk[ChatCompletionChunk, OpenAITool]):
"""Convenience wrapper around chat completion streaming chunks.
When using Mirascope's convenience wrappers to interact with OpenAI models via
`OpenAICall.stream`, responses will return an `OpenAICallResponseChunk`, whereby
the implemented properties allow for simpler syntax and a convenient developer
experience.
Example:
```python
from mirascope.openai import OpenAICall
class Math(OpenAICall):
prompt_template = "What is 1 + 2?"
for chunk in OpenAICall().stream():
print(chunk.content)
#> 1
# +
# 2
# equals
#
# 3
# .
"""
response_format: Optional[ResponseFormat] = None
@property
def choices(self) -> list[ChunkChoice]:
"""Returns the array of chat completion choices."""
return self.chunk.choices
@property
def choice(self) -> ChunkChoice:
"""Returns the 0th choice."""
return self.chunk.choices[0]
@property
def delta(self) -> Optional[ChoiceDelta]:
"""Returns the delta for the 0th choice."""
if self.chunk.choices:
return self.chunk.choices[0].delta
return None
@property
def content(self) -> str:
"""Returns the content for the 0th choice delta."""
return (
self.delta.content if self.delta is not None and self.delta.content else ""
)
@property
def tool_calls(self) -> Optional[list[ChoiceDeltaToolCall]]:
"""Returns the partial tool calls for the 0th choice message.
The first `list[ChoiceDeltaToolCall]` will contain the name of the tool and
index, and subsequent `list[ChoiceDeltaToolCall]`s will contain the arguments
which will be strings that need to be concatenated with future
`list[ChoiceDeltaToolCall]`s to form a complete JSON tool calls. The last
`list[ChoiceDeltaToolCall]` will be None indicating end of stream.
"""
if self.delta:
return self.delta.tool_calls
return None
@property
def usage(self) -> Optional[CompletionUsage]:
"""Returns the usage of the chat completion."""
if self.response.usage:
return self.response.usage
return None
@property
def input_tokens(self) -> Optional[int]:
"""Returns the number of input tokens."""
if self.usage:
return self.usage.prompt_tokens
return None
@property
def output_tokens(self) -> Optional[int]:
"""Returns the number of output tokens."""
if self.usage:
return self.usage.completion_tokens
return None What will also need to be updated are our Finally, the user when iterating through the stream can check if cost exists. from mirascope.openai import OpenAICall
class BookRecommender(OpenAICall):
prompt_template = "Please recommend a {genre} book."
genre: str
stream = BookRecommender(genre="fantasy").stream()
for chunk in stream:
print(chunk.content, end="")
if chunk.cost is not None:
print(chunk.cost) |
I wonder if we could take advantage of the generator return value to push the cost check inside of the generator if desired. For instance: stream = BookRecommender(genre="fantasy").stream()
for chunk in stream:
print(chunk.content, end="", flush=True)
cost = stream.value # Optional[float] Internally we would check for the chunk cost (i.e. do everything the same as above) but return it so the user doensn't have to manually check if cost is not |
This comment was marked as duplicate.
This comment was marked as duplicate.
Hi, @willbakst, I am working on this issue and I was able to add the feature for OpenAI. Now while working for Cohere API, the Example: "token_count": {
"prompt_tokens": 2821,
"response_tokens": 29,
"total_tokens": 2850,
"billed_tokens": 37
} can be converted to: "billed_units": {
"input_tokens": 8,
"output_tokens": 29
} |
Please submit the PR for OpenAI first without the cohere stuff so we can review in smaller chunks. Please also move the discussion on Cohere to its issue so we can continue tracking even if we close this issue. I’ll need to take a deeper look into the cohere API to give the best answer. My quick answer would be that massaging the data into the desired format could work, but if you think there’s a better option we can always review in the PR where we can better see how it works all together. Thanks! |
I was going to raise the PR for OpenAI, but the issue with that was that as I made changes to the |
I would only add the abstract methods if we’re going to require these methods on all response chunk types, but given that not all of them currently support streaming cost tracking we should just make the methods specific (for now) to the providers that support it. |
This is released in v0.16 🎉 |
Is your feature request related to a problem? Please describe.
Prior versions of openai did not have usage stats when streaming.
Describe the solution you'd like
Add
stream_options: {"include_usage": true}
. Addtotal_cost
as a property ofOpenAICallResponseChunk
.Additional context
OpenAI Cookbook Reference
The text was updated successfully, but these errors were encountered: