Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llm-eval: not responding to 'what is your name?' or 'what is the difference between star wars and star trek?' #9

Open
CharlieTLe opened this issue Mar 2, 2024 · 2 comments

Comments

@CharlieTLe
Copy link

CharlieTLe commented Mar 2, 2024

On my Mac, I see the error

CLIENT ERROR: TUINSRemoteViewController does not override -viewServiceDidTerminateWithError: and thus cannot react to catastrophic errors beyond logging them

It does respond to compare python and swift fine though

@davidkoski
Copy link
Collaborator

That actually looks "right":

python -m mlx_lm.generate --model ~/Documents/huggingface/models/mlx-community/phi-2-hf-4bit-mlx --prompt 'Instruct: what is your name?. Output: '
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
==========
Prompt: Instruct: what is your name?. Output: 


==========
Prompt: 32.359 tokens-per-sec
Generation: 0.000 tokens-per-sec

The problem seems to be in the prompt template:

        "Instruct: \(prompt). Output: "

it should be:

        "Instruct: \(prompt)\nOutput: "

that gives a much better response, though in (perhaps) Chinese?

Nothing from 'what is the difference between star wars and star trek?' but the python version doesn't either.

@davidkoski
Copy link
Collaborator

It looks like phi2 can't answer that prompt -- maybe it doesn't cover that info or maybe it is tool small? mistral7B4bit aka mlx-community/Mistral-7B-v0.1-hf-4bit-mlx seems to do an ok job, though sometimes a bit silly.

Three changes were made and I think this fixes or greatly improves the response here:

  • the prompt for Phi was adjusted to fit the format better -- it is sensitive to the exact wording
  • the temperature was set up to 0.6 to match the python code
  • a new random seed is generated for each time you generate -- you can explore a little

You may need to switch to a larger model like Mistral 7B to see more interesting responses for a wider range of inputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants