Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StableLM #8

Open
vmanot opened this issue Mar 5, 2024 · 5 comments
Open

StableLM #8

vmanot opened this issue Mar 5, 2024 · 5 comments
Assignees
Labels
help wanted Extra attention is needed

Comments

@vmanot
Copy link
Contributor

vmanot commented Mar 5, 2024

gN5JmWWk jpg-small

@vmanot vmanot added the help wanted Extra attention is needed label Mar 5, 2024
@awni
Copy link
Collaborator

awni commented Mar 5, 2024

CC @davidkoski this is the issue I was referring to! Any help here much appreciated 👍

@davidkoski
Copy link
Collaborator

I see a slightly different issue, but likely related:

libc++abi: terminating due to uncaught exception of type std::invalid_argument: [addmm] Last dimension of first input with shape (1,0,-1) must match second to last dimension of second input with shape (2048,2048).

This is coming from Attention -> Linear. The problem stems from the shape of scores being [1, 0, 2048]. This is used to compute:

        let valuesHat = (scores.matmul(values)).transposed(0, 2, 1, 3).reshaped(B, L, -1)

which produces a shape of [1, 0, -1] (which is an issue, ml-explore/mlx#789, but not the cause).

At the top level:

        (logits, cache) = model(expandedDimensions(y, axis: 0), cache: cache.isEmpty ? nil : cache)

y is an empty array:

(lldb) po y
array([], dtype=int32)

and it looks like that is what we are getting out of the tokenizer (I am using the default Why did the chicken cross the road? prompt).

I will continue looking at this later.

@davidkoski
Copy link
Collaborator

Here is the pretokenizer config:

(lldb) po config.pretokenizers?.arrayValue
▿ Optional<Array<Config>>
  ▿ some : 2 elements
    ▿ 0 : Config
      ▿ dictionary : 4 elements
        ▿ 0 : 2 elements
          - key : "type"
          - value : Split
        ▿ 1 : 2 elements
          - key : "behavior"
          - value : Removed
        ▿ 2 : 2 elements
          - key : "pattern"
          ▿ value : 1 element
            ▿ 0 : 2 elements
              - key : Regex
              - value : (?i:'s|'t|'re|'ve|'m|'ll|'d)|[^
\p{L}\p{N}]?\p{L}+|\p{N}| ?[^\s\p{L}\p{N}]+[
]*|\s*[
]+|\s+(?!\S)|\s+
        ▿ 3 : 2 elements
          - key : "invert"
          - value : 1
    ▿ 1 : Config
      ▿ dictionary : 4 elements
        ▿ 0 : 2 elements
          - key : "add_prefix_space"
          - value : 0
        ▿ 1 : 2 elements
          - key : "trim_offsets"
          - value : 1
        ▿ 2 : 2 elements
          - key : "use_regex"
          - value : 0
        ▿ 3 : 2 elements
          - key : "type"
          - value : ByteLevel

which is handled here:

class SplitPreTokenizer: PreTokenizer {
...
    func preTokenize(text: String) -> [String] {
        guard let pattern = pattern else { return [text] }
        return pattern.split(text, invert: invert)
    }

Invert is true:

(lldb) p invert
(Bool) true

giving us this:

(lldb) p pattern.split(text, invert: true)
([String]) 1 value {
  [0] = ""
}

However if invert was false:

(lldb) p pattern.split(text, invert: false)
([String]) 10 values {
  [0] = "Why"
  [1] = " did"
  [2] = " the"
  [3] = " chicken"
  [4] = " cross"
  [5] = " the"
  [6] = " road"
  [7] = "?"
  [8] = " "
  [9] = ""
}

That looks reasonable. I don't know if:

  • invert should be false (the config seems to set it to true)
  • the StringSplitPattern isn't handling invert correctly
  • or there is something unhandled in the regular expression

If I edit the tokenizer.json and replace the value:

  "pre_tokenizer": {
    "type": "Sequence",
    "pretokenizers": [
      {
        "type": "Split",
        "pattern": { 
          "Regex": "(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\r\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}| ?[^\\s\\p{L}\\p{N}]+[\r\n]*|\\s*[\r\n]+|\\s+(?!\\S)|\\s+"
        },
        "behavior": "Removed",
        "invert": false
      },

it does ... something :-)

image

Anyway, that is the cause of this assertion failure.

@davidkoski
Copy link
Collaborator

Here are some ideas on how to debug the model behavior:

python -m mlx_lm.generate --model mlx-community/stablelm-2-zephyr-1_6b-4bit --prompt 'why did the chicken cross the road?'

==========
Prompt: <|user|>
why did the chicken cross the road?<|endoftext|>
<|assistant|>

The origin of the popular question, "Why did the chicken cross the road?" is a cultural phenomenon that dates back to ancient times. While it is often used as a humorous play on words, the question likely stems from a desire to understand the behavior of chickens as a group. The answer "because it's Friday" or "it's the chicken's way" does not accurately answer the question, as it doesn't provide a clear reason for the chickens' action. The question is often used for
==========

Notice the augmentation of the prompt -- this is done using python code in the tokenizer configuration. We can't run that so you may need some configuration to help with this. For example in the example repo:

Simple, but probably helpful.

Given the working python version you can do a few things:

  • the tokenizer produces an array of integers

    • print out the tokens the python code generates, see utils.py: prompt_tokens = mx.array(tokenizer.encode(prompt))
    • hard code the swift code to take this same array
    • if this array works then you can suspect something in the tokenizer
  • the tokenizer can decode the tokens it prepares

    • make sure it can decode both the tokens the swift tokenizer makes
    • and the tokens the python code makes
  • set the random seed

    • --seed in the command line tool and MLXRandom.seed() in python
    • maybe set the temperature to 0
    • generate a small number of tokens
    • are they the same? the code to produce tokens from the logits might be slightly different between the two but I found the first token is usually the same with the same seed
  • assuming the tokens are different compare the execution of the models

    • I found something like print("\(name) \(array.shape) \(array.sum())") in both swift and python (similar code in python) can help spot differences without looking at the whole tensor
    • I had typos in the Attention layer a couple times -- incorrectly place parenthesis, etc.
  • make sure your weights are loaded correctly

    • I noticed that you turned off verification of the arrays: try model.update(parameters: parameters, verify: [.none])
    • turn that on -- at the ver least your Attention layer has incorrect keys

Good luck and ask if you have questions!

@vmanot
Copy link
Contributor Author

vmanot commented Mar 7, 2024

@davidkoski thank you so much for the detailed breakdown of what went on here, this made a ton of sense!

The other takeaway for me here is that we need to improve debuggability + implement sanity checks, and also probably expose verification as a visible parameter that can be checked on/off. I'm going to think on a bit and add some UI for it – let me know if anything comes to mind here!

@vmanot vmanot self-assigned this Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants