Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a template configuration that supports Llama3-ChatQA-1.5-70B? #2287

Open
WuQic opened this issue May 11, 2024 · 2 comments
Open

Is there a template configuration that supports Llama3-ChatQA-1.5-70B? #2287

WuQic opened this issue May 11, 2024 · 2 comments
Labels
bug Something isn't working unconfirmed

Comments

@WuQic
Copy link

WuQic commented May 11, 2024

use this config can't answer the question

name: llama3-70b-chatQA
mmap: true
context_size: 8192
#threads: 11
#gpu_layers: 90
f16: true
parameters:
  model: Llama3-ChatQA-1.5-70B-Q4_K_M.gguf
function:
  # set to true to allow the model to call multiple functions in parallel
  parallel_calls: true
template:
  chat_message: |
    <|start_header_id|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "tool"}}tool{{else if eq .RoleName "user"}}user{{end}}<|end_header_id|>

    {{ if .FunctionCall -}}
    Function call:
    {{ else if eq .RoleName "tool" -}}
    Function response:
    {{ end -}}
    {{ if .Content -}}
    {{.Content -}}
    {{ else if .FunctionCall -}}
    {{ toJson .FunctionCall -}}
    {{ end -}}
    <|eot_id|>
  function: |
    <|start_header_id|>system<|end_header_id|>

    You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools:
    <tools>
    {{range .Functions}}
    {'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }}
    {{end}}
    </tools>
    Use the following pydantic model json schema for each tool call you will make:
    {'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
    Function call:
  chat: |
    <|begin_of_text|>{{.Input }}
    <|start_header_id|>assistant<|end_header_id|>
  completion: |
    {{.Input}}
stopwords:
- <|im_end|>
- <dummy32000>
- <|eot_id|>
- <|end_of_text|>
usage: |
      curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
          "model": "llama3-70b-chatQA",
          "messages": [{"role": "user", "content": "How are you doing?", "temperature": 0.1}]
      }'
@WuQic WuQic added bug Something isn't working unconfirmed labels May 11, 2024
@fakezeta
Copy link
Collaborator

fakezeta commented May 11, 2024

Hi @WuQic I tested this model in transformer backend with the OpenVINO version.
Was not particularly impressed, If you want to give a try this is the model definition.

name: ChatQA
backend: transformers
parameters:
  model: fakezeta/Llama3-ChatQA-1.5-8B-ov-int8
context_size: 8192
type: OVModelForCausalLM
template:
  use_tokenizer_template: true
stopwords:
- "<|eot_id|>"
- "<|end_of_text|>"

The template is in the tokenizer_config.json file coming from Nvidia.

@thiner
Copy link
Contributor

thiner commented May 24, 2024

I tested the 8b model, not recommended in my opinion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working unconfirmed
Projects
None yet
Development

No branches or pull requests

3 participants