Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Estimate token usage, cost #870

Open
pengelbrecht opened this issue Mar 14, 2024 Discussed in #546 · 5 comments
Open

Estimate token usage, cost #870

pengelbrecht opened this issue Mar 14, 2024 Discussed in #546 · 5 comments

Comments

@pengelbrecht
Copy link

Discussed in #546

Originally posted by ww-jermaine August 25, 2023
Hello, is there a way to estimate the token usage and cost per call of ai_fn, ai_model, etc.?

Something like the callback from langchain:

Tokens Used: 0
        Prompt Tokens: 0
        Completion Tokens: 0
Successful Requests: 0
Total Cost (USD): $0.0
```</div>
@zzstoatzz
Copy link
Collaborator

hi @pengelbrecht - thanks for the issue! let me know if something like this is what you're looking for and/or feel free to make a specific enhancement request

@pengelbrecht
Copy link
Author

It seems like that would work. However, I'm primarily a hobby programmer who appreciates the simplicity of Marvin, so the subclassing approach might be a bit beyond my expertise. Ideally, I'd prefer something simpler and more in line with Marvin's design philosophy. Unfortunately, I'm not really qualified to suggest a specific alternative. Sorry.

@zzstoatzz
Copy link
Collaborator

thanks for the response @pengelbrecht - no worries.

if you don't mind, what would you ideal experience look like? people often have drastically different ideas as far as what they want token tracking to look like, but your perspective would be useful to build a sense of what a common-sense / middle-of-the-ground offering might look like

@pengelbrecht
Copy link
Author

Here's how I do it today with direct openAI API use.

But returning a tuple doesn't feel very Marvinesque :)

def openai_cost_usd(model_name, prompt_tokens, completion_tokens):
    if model_name == "gpt-4-turbo-preview":
        return prompt_tokens * 10.0 / 1e6 + completion_tokens * 30.0 / 1e6
    elif model_name == "gpt-3.5-turbo":
        return prompt_tokens * 0.5 / 1e6 + completion_tokens * 1.5 / 1e6
    else:
        return None

async def fetch_chat_completion(
    user_message: str,
    system_prompt: str = _default_system_prompt,
    model_name: str = _default_model,
    temperature: float = _default_temperature,
) -> Tuple[str, int, float]:
    """Fetch a single chat completion for a user message"""
    chat_completion = await client.chat.completions.create(
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_message},
        ],
        model=model_name,
        temperature=temperature,
    )
    response_message = chat_completion.choices[0].message.content
    prompt_tokens = chat_completion.usage.prompt_tokens
    completion_tokens = chat_completion.usage.completion_tokens
    total_tokens = prompt_tokens + completion_tokens
    cost = openai_cost_usd(model_name, prompt_tokens, completion_tokens)
    return response_message, total_tokens, cost

@pengelbrecht
Copy link
Author

litellm's approach is wonderful: https://litellm.vercel.app/docs/completion/token_usage – but I guess there's no parallel to the completion object in Marvin's approach?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants