StableLM #8

vmanot · 2024-03-05T00:26:14Z

awni · 2024-03-05T23:31:15Z

CC @davidkoski this is the issue I was referring to! Any help here much appreciated 👍

davidkoski · 2024-03-06T00:18:44Z

I see a slightly different issue, but likely related:

libc++abi: terminating due to uncaught exception of type std::invalid_argument: [addmm] Last dimension of first input with shape (1,0,-1) must match second to last dimension of second input with shape (2048,2048).

This is coming from Attention -> Linear. The problem stems from the shape of scores being [1, 0, 2048]. This is used to compute:

        let valuesHat = (scores.matmul(values)).transposed(0, 2, 1, 3).reshaped(B, L, -1)

which produces a shape of [1, 0, -1] (which is an issue, ml-explore/mlx#789, but not the cause).

At the top level:

        (logits, cache) = model(expandedDimensions(y, axis: 0), cache: cache.isEmpty ? nil : cache)

y is an empty array:

(lldb) po y
array([], dtype=int32)

and it looks like that is what we are getting out of the tokenizer (I am using the default Why did the chicken cross the road? prompt).

I will continue looking at this later.

davidkoski · 2024-03-06T03:13:59Z

Here is the pretokenizer config:

(lldb) po config.pretokenizers?.arrayValue
▿ Optional<Array<Config>>
  ▿ some : 2 elements
    ▿ 0 : Config
      ▿ dictionary : 4 elements
        ▿ 0 : 2 elements
          - key : "type"
          - value : Split
        ▿ 1 : 2 elements
          - key : "behavior"
          - value : Removed
        ▿ 2 : 2 elements
          - key : "pattern"
          ▿ value : 1 element
            ▿ 0 : 2 elements
              - key : Regex
              - value : (?i:'s|'t|'re|'ve|'m|'ll|'d)|[^
\p{L}\p{N}]?\p{L}+|\p{N}| ?[^\s\p{L}\p{N}]+[
]*|\s*[
]+|\s+(?!\S)|\s+
        ▿ 3 : 2 elements
          - key : "invert"
          - value : 1
    ▿ 1 : Config
      ▿ dictionary : 4 elements
        ▿ 0 : 2 elements
          - key : "add_prefix_space"
          - value : 0
        ▿ 1 : 2 elements
          - key : "trim_offsets"
          - value : 1
        ▿ 2 : 2 elements
          - key : "use_regex"
          - value : 0
        ▿ 3 : 2 elements
          - key : "type"
          - value : ByteLevel

which is handled here:

class SplitPreTokenizer: PreTokenizer {
...
    func preTokenize(text: String) -> [String] {
        guard let pattern = pattern else { return [text] }
        return pattern.split(text, invert: invert)
    }

Invert is true:

(lldb) p invert
(Bool) true

giving us this:

(lldb) p pattern.split(text, invert: true)
([String]) 1 value {
  [0] = ""
}

However if invert was false:

(lldb) p pattern.split(text, invert: false)
([String]) 10 values {
  [0] = "Why"
  [1] = " did"
  [2] = " the"
  [3] = " chicken"
  [4] = " cross"
  [5] = " the"
  [6] = " road"
  [7] = "?"
  [8] = " "
  [9] = ""
}

That looks reasonable. I don't know if:

invert should be false (the config seems to set it to true)
the StringSplitPattern isn't handling invert correctly
or there is something unhandled in the regular expression

If I edit the tokenizer.json and replace the value:

  "pre_tokenizer": {
    "type": "Sequence",
    "pretokenizers": [
      {
        "type": "Split",
        "pattern": { 
          "Regex": "(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\r\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}| ?[^\\s\\p{L}\\p{N}]+[\r\n]*|\\s*[\r\n]+|\\s+(?!\\S)|\\s+"
        },
        "behavior": "Removed",
        "invert": false
      },

it does ... something :-)

Anyway, that is the cause of this assertion failure.

davidkoski · 2024-03-06T04:44:36Z

Here are some ideas on how to debug the model behavior:

install the llms/mlx_lm code from https://github.com/ml-explore/mlx-examples
- this gives you a working python version that you can compare against
- you can invoke it with this:

python -m mlx_lm.generate --model mlx-community/stablelm-2-zephyr-1_6b-4bit --prompt 'why did the chicken cross the road?'

==========
Prompt: <|user|>
why did the chicken cross the road?<|endoftext|>
<|assistant|>

The origin of the popular question, "Why did the chicken cross the road?" is a cultural phenomenon that dates back to ancient times. While it is often used as a humorous play on words, the question likely stems from a desire to understand the behavior of chickens as a group. The answer "because it's Friday" or "it's the chicken's way" does not accurately answer the question, as it doesn't provide a clear reason for the chickens' action. The question is often used for
==========

Notice the augmentation of the prompt -- this is done using python code in the tokenizer configuration. We can't run that so you may need some configuration to help with this. For example in the example repo:

https://github.com/ml-explore/mlx-swift-examples/blob/main/Libraries/LLM/Models.swift#L65

Simple, but probably helpful.

Given the working python version you can do a few things:

the tokenizer produces an array of integers
- print out the tokens the python code generates, see utils.py: prompt_tokens = mx.array(tokenizer.encode(prompt))
- hard code the swift code to take this same array
- if this array works then you can suspect something in the tokenizer
the tokenizer can decode the tokens it prepares
- make sure it can decode both the tokens the swift tokenizer makes
- and the tokens the python code makes
set the random seed
- --seed in the command line tool and MLXRandom.seed() in python
- maybe set the temperature to 0
- generate a small number of tokens
- are they the same? the code to produce tokens from the logits might be slightly different between the two but I found the first token is usually the same with the same seed
assuming the tokens are different compare the execution of the models
- I found something like print("\(name) \(array.shape) \(array.sum())") in both swift and python (similar code in python) can help spot differences without looking at the whole tensor
- I had typos in the Attention layer a couple times -- incorrectly place parenthesis, etc.
make sure your weights are loaded correctly
- I noticed that you turned off verification of the arrays: try model.update(parameters: parameters, verify: [.none])
- turn that on -- at the ver least your Attention layer has incorrect keys

Good luck and ask if you have questions!

vmanot · 2024-03-07T06:09:03Z

@davidkoski thank you so much for the detailed breakdown of what went on here, this made a ton of sense!

The other takeaway for me here is that we need to improve debuggability + implement sanity checks, and also probably expose verification as a visible parameter that can be checked on/off. I'm going to think on a bit and add some UI for it – let me know if anything comes to mind here!

vmanot added the help wanted Extra attention is needed label Mar 5, 2024

davidkoski mentioned this issue Mar 6, 2024

SplitPreTokenizer with invert true returning array with empty string huggingface/swift-transformers#55

Open

vmanot self-assigned this Mar 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StableLM #8

StableLM #8

vmanot commented Mar 5, 2024

awni commented Mar 5, 2024

davidkoski commented Mar 6, 2024

davidkoski commented Mar 6, 2024

davidkoski commented Mar 6, 2024

vmanot commented Mar 7, 2024 •

edited

StableLM #8

StableLM #8

Comments

vmanot commented Mar 5, 2024

awni commented Mar 5, 2024

davidkoski commented Mar 6, 2024

davidkoski commented Mar 6, 2024

davidkoski commented Mar 6, 2024

vmanot commented Mar 7, 2024 • edited

vmanot commented Mar 7, 2024 •

edited