Is there a template configuration that supports Llama3-ChatQA-1.5-70B? #2287

WuQic · 2024-05-11T06:31:40Z

use this config can't answer the question

name: llama3-70b-chatQA
mmap: true
context_size: 8192
#threads: 11
#gpu_layers: 90
f16: true
parameters:
  model: Llama3-ChatQA-1.5-70B-Q4_K_M.gguf
function:
  # set to true to allow the model to call multiple functions in parallel
  parallel_calls: true
template:
  chat_message: |
    <|start_header_id|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "tool"}}tool{{else if eq .RoleName "user"}}user{{end}}<|end_header_id|>

    {{ if .FunctionCall -}}
    Function call:
    {{ else if eq .RoleName "tool" -}}
    Function response:
    {{ end -}}
    {{ if .Content -}}
    {{.Content -}}
    {{ else if .FunctionCall -}}
    {{ toJson .FunctionCall -}}
    {{ end -}}
    <|eot_id|>
  function: |
    <|start_header_id|>system<|end_header_id|>

    You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools:
    <tools>
    {{range .Functions}}
    {'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }}
    {{end}}
    </tools>
    Use the following pydantic model json schema for each tool call you will make:
    {'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
    Function call:
  chat: |
    <|begin_of_text|>{{.Input }}
    <|start_header_id|>assistant<|end_header_id|>
  completion: |
    {{.Input}}
stopwords:
- <|im_end|>
- <dummy32000>
- <|eot_id|>
- <|end_of_text|>
usage: |
      curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
          "model": "llama3-70b-chatQA",
          "messages": [{"role": "user", "content": "How are you doing?", "temperature": 0.1}]
      }'

fakezeta · 2024-05-11T16:17:21Z

Hi @WuQic I tested this model in transformer backend with the OpenVINO version.
Was not particularly impressed, If you want to give a try this is the model definition.

name: ChatQA
backend: transformers
parameters:
  model: fakezeta/Llama3-ChatQA-1.5-8B-ov-int8
context_size: 8192
type: OVModelForCausalLM
template:
  use_tokenizer_template: true
stopwords:
- "<|eot_id|>"
- "<|end_of_text|>"

The template is in the tokenizer_config.json file coming from Nvidia.

thiner · 2024-05-24T16:40:48Z

I tested the 8b model, not recommended in my opinion.

WuQic added bug Something isn't working unconfirmed labels May 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a template configuration that supports Llama3-ChatQA-1.5-70B? #2287

Is there a template configuration that supports Llama3-ChatQA-1.5-70B? #2287

WuQic commented May 11, 2024

fakezeta commented May 11, 2024 •

edited

thiner commented May 24, 2024

Is there a template configuration that supports Llama3-ChatQA-1.5-70B? #2287

Is there a template configuration that supports Llama3-ChatQA-1.5-70B? #2287

Comments

WuQic commented May 11, 2024

fakezeta commented May 11, 2024 • edited

thiner commented May 24, 2024

fakezeta commented May 11, 2024 •

edited