Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: AzureChatOpenAI model name not recorded correctly #2029

Open
arthurGrigo opened this issue May 9, 2024 · 8 comments
Open

bug: AzureChatOpenAI model name not recorded correctly #2029

arthurGrigo opened this issue May 9, 2024 · 8 comments
Assignees
Labels

Comments

@arthurGrigo
Copy link

arthurGrigo commented May 9, 2024

Describe the bug

The 'Total Cost' in langfuse shows a lower value than langchains get_openai_callback() returns.

Tested with Azure's GPT API. Not sure if it's the same with OpenAI's API.

langchains get_openai_callback(): Total Cost (USD): $0.0010149999999999998
langfuse UI: Total Cost: $0.0008

This discrepancy becomes more obvious with complex chains.
I have already seen differences of up to 3 x.

To reproduce

Requirements:

langfuse.version is 2.27.0

Python Version: 3.11.9 | packaged by Anaconda, Inc. | (main, Apr 19 2024, 16:40:41) [MSC v.1916 64 bit (AMD64)]

Package Information

langchain_core: 0.1.46
langchain: 0.1.16
langchain_community: 0.0.34
langsmith: 0.1.51
langchain_experimental: 0.0.57
langchain_openai: 0.1.4
langchain_text_splitters: 0.0.1
langgraph: 0.0.39


Code to reproduce:

import os

AZURE_API_BASE_GPT = os.getenv('AZURE_OPENAI_ENDPOINT_GPT_3_5')
AZURE_API_KEY_GPT = os.getenv('AZURE_OPENAI_KEY_GPT_3_5')
DEPLOYMENT_NAME_GPT = os.getenv('GPT_DEPLOYMENT_NAME_GPT_3_5')

gpt_parameter_temperature = 0.0

openai_api_type = "azure"
openai_api_version = "2023-05-15"

from langchain_openai import AzureChatOpenAI

llm = AzureChatOpenAI(
                    azure_endpoint= AZURE_API_BASE_GPT,
                    openai_api_key= AZURE_API_KEY_GPT,
                    azure_deployment= DEPLOYMENT_NAME_GPT,
                    temperature= gpt_parameter_temperature,
                    openai_api_version= openai_api_version,
                    openai_api_type= openai_api_type,
                    callbacks= [],
                    verbose= True,
                )

# gpt-35-turbo-0613'
deployment_name = DEPLOYMENT_NAME_GPT

enable_tracing_langfuse = True

LANGFUSE_HOST = os.getenv('LANGFUSE_HOST', "http://localhost:3000")
LANGFUSE_PUBLIC_KEY = os.getenv('LANGFUSE_PUBLIC_KEY', "pk-...")
LANGFUSE_SECRET_KEY = os.getenv('LANGFUSE_SECRET_KEY', "sk-...")


callbacks = []

if enable_tracing_langfuse:
    print("enable_tracing_langfuse")
    from langfuse.callback import CallbackHandler
    callback_langfuse = CallbackHandler(
                                    public_key= LANGFUSE_PUBLIC_KEY,
                                    secret_key= LANGFUSE_SECRET_KEY,
                                    host= LANGFUSE_HOST
                                )
    
    callbacks.append(callback_langfuse)

from langchain_core.runnables.config import RunnableConfig

MAX_CONCURRENCY = 2

runnable_conf = RunnableConfig(
                            max_concurrency= MAX_CONCURRENCY, 
                            run_name= "my_langfuse_experiment", 
                            callbacks= callbacks,
                            tags=[
                                    deployment_name, 
                                    f'temp={gpt_parameter_temperature}'
                                ]
                        )

from langchain_core.runnables import RunnableParallel
from langchain_core.output_parsers.string import StrOutputParser
from langchain_community.callbacks import get_openai_callback
from langchain.prompts import ChatPromptTemplate

str_prsr = StrOutputParser()

prompt =  ChatPromptTemplate.from_template("Write a sentence with 200 short words about {thing}.")

# Makes no difference if parallel or not ...
# chain = RunnableParallel( 
#                     run_1 = prompt | llm | str_prsr,
#                     run_2 = prompt | llm | str_prsr,
#                     run_3 = prompt | llm | str_prsr,
#                     run_4 = prompt | llm | str_prsr,
#                 )

chain =  prompt | llm | str_prsr
                    
user_input = {"thing": "apples"} 

with get_openai_callback() as cb:
    sentences = chain.invoke(user_input, config= runnable_conf)

print(cb)

# Tokens Used: 512
#	Prompt Tokens: 18
#	Completion Tokens: 494
# Successful Requests: 1
# Total Cost (USD): $0.0010149999999999998



# The langfuse UI shows the same token usage information but the 'Total Cost' is different 
# In langfuse 'Total Cost' = $0.0008

# This discrepancy becomes more obvious with complex chains. 
# I have already seen differences of 3 x. 

SDK and container versions

No response

Additional information

No response

Are you interested to contribute a fix for this bug?

Yes

@marcklingen
Copy link
Member

thanks for reporting this, are the token counts correct in Langfuse? What model name gets recorded?

@arthurGrigo
Copy link
Author

thanks for reporting this, are the token counts correct in Langfuse? What model name gets recorded?

I think you are on the right track!
In the returned completion object the model_name='gpt-35-turbo'.
Shouldn't this be 'gpt-35-turbo-0613'?

Copy link
Member

there are two model names available usually

  1. the one that's used to create the request
  2. the one that's included in the response

Usually the response name is more specific as the request does not need to include the model version.

Here the opposite seems to be the case. we'll need to have a look and add some tests to CI

token counts are correct?

@arthurGrigo
Copy link
Author

there are two model names available usually

  1. the one that's used to create the request
  2. the one that's included in the response

Usually the response name is more specific as the request does not need to include the model version.

Here the opposite seems to be the case. we'll need to have a look and add some tests to CItoken counts are correct?

Yes, token counts are the same.
Thanks for your quick reply!

@marcklingen
Copy link
Member

perfect, thanks for confirming

@marcklingen marcklingen changed the title bug: Total Cost different from langchains get_openai_callback() - LCEL with AzureChatOpenAI bug: AzureChatOpenAI model name not recorded correctly May 9, 2024
@arthurGrigo
Copy link
Author

arthurGrigo commented May 20, 2024

Note how the request says "model": "gpt-3.5-turbo" even though I use gpt-4 in this example. Azure knows which model to use because I created a deployment in azure and associated it with "gpt4-1106-preview".

When using azure you have to create a named deployment and assign a model to it. When calling their API, the URL holds the deployment name. You could hope to parse the model name from the URL which would make it easy for the user. In case the deployment name has a weird name without the model name in it you would need the user to provide a dict in which he maps the deployment name to a model name I guess.

Ideally you check how langchain does it in get_openai_callback().

Azure API call example:

{

    "body": null,

    "code": null,

    "type": null,

    "param": null,

    "message": "Connection error.",

    "request": {

        "url": {

            "_uri_reference": [

                "https",

                "",

                "xyz.openai.azure.com",

                null,

                "/openai/deployments/gpt4-1106-preview/chat/completions",

                "api-version=2023-05-15",

                null

            ]

        },

        "method": "POST",

        "stream": {

            "_stream": {

                "messages": [

                    {

                        "content": "...",

                        "role": "user"

                    }

                ],

                "model": "gpt-3.5-turbo",

                "n": 1,

                "stream": false,

                "temperature": 1

            }

        },

        "headers": {},

        "_content": {

            "messages": [

                {

                    "content": "...",

                    "role": "user"

                }

            ],

            "model": "gpt-3.5-turbo",

            "n": 1,

            "stream": false,

            "temperature": 1

        },

        "extensions": {

            "timeout": {

                "pool": null,

                "read": null,

                "write": null,

                "connect": null

            }

        }

    }

}

@sbhadana
Copy link

sbhadana commented Jun 3, 2024

Same for me, using gpt-4-32k 0613 from Azureopenai but langfuse reporting gpt-3.5-turbo on dashboard.

@marcklingen
Copy link
Member

Same for me, using gpt-4-32k 0613 from Azureopenai but langfuse reporting gpt-3.5-turbo on dashboard.

are you on the latest sdk version?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants