New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Degraded response quality on v 0.1.33 #4227
Comments
Can you provide examples of each version's prompts and generated responses? Something to try and recreate what you observe. There has been a change in how Ollama compiles the llama.cpp. We could investigate how different build configurations affect the model's performance. |
@MarkWard0110 , while I cannot use the actual documents and generated responses, I can provide you with some fictional examples, generated with Ollama 0.1.32 and 0.1.33, using the same code (thus same prompts, system prompts and inference options): All examples used:
System Prompt:
Prompt 1 (content taken from Wikipedia)
Bad Response (v0.1.33)Notes: The quality of the summary is very poor and the model did not obey the instruction to respond in Portuguese
Good Response (v0.1.32)Notes: The summary has very good quality, succinctly describing the facts in the provided in the text, in the requested language.
Prompt 2 (LLM Generated memo, requesting resources for a fictional new research division)
Bad Response (v0.1.33)Notes: The response is quite verbose and was not translated. I did not follow what was asked in the prompt.
Good Response (v0.1.32)Notes: Again, the summary has very good quality, succinctly describing the facts in the provided in the text, in the requested language.
Thank you for looking into this and let me know if I can offer any further assistance. |
Minor Update. Routing the same prompts/configs to Ollama v0.1.33 on a Windows 10 machine produced similar results. |
Macos the same, I updated it and it is heavily hallucinating... |
@DiegoGonzalezCruz, what model(s) are you having issues with, if you don't mind me asking? |
I have also encountered similar behavior on Ubuntu 22.04 on an A100 GPU (through the ollama docker image). I wanted to upgrade from 0.1.32 to 0.1.33 to use the new I then noticed the same of kind regressions as the ones mentioned above on the following LLMs:
|
Update: quickly retested the application with v0.1.34 with all 3 models, this issue seems to not be there anymore. |
Same issue with model
|
Facing the same issue The responses in version 0.1.34 exhibit improvement over those in version 0.1.33, yet the responses in version 0.1.32 are much better. I'm using mixtral:7x8b and llama3:8b |
I just compared 0.1.32 and latest (0.1.34) and I see no difference in quality. I didn't use 0.1.33 so I can't comment on that. But I wonder how much of this issue report is real and how much is imaginary/placebo. Remember that every generation/regeneration uses a different seed, and temperature affects every response randomly, so you can't just compare one or two outputs of each model. You need a lot of data to confirm whether a model has degraded. In fact, you should be using LLM evaluation benchmarks. |
I just tested 0.1.34 on a Linux Machine and a Windows 10 Machine and, apparently, the quality of responses has returned to normal and instructions are being observed again. Thank you @MarkWard0110 for the quick release! Leaving this open so the other users that reported issues can comment on their experience. |
I also noticed issues, but only when OLLAMA_NUM_PARALLEL is used with a number higher than 1, the LLM doesn't follow instructions properly. |
What is the issue?
I have an application that creates summaries of text and noticed that - after upgrading to v0.1.33 - the quality of the generated content was MUCH worse, without any change in the code or model.
I rolled back to 0.1.32, and the responses immediately went back to normal.
This is running on docker, in a UBUNTU 22.04 VM with 128G RAM (no gpu), and the model I have been using is dolphin-mistral:v2.6.
Anyone else experienced something similar?
It "looks" like Ollama was using a completely different model (that machine has a few installed) or that it wasn't allocating enough resources for the model to perform (these are just impressions)
OS
Linux
GPU
Intel
CPU
Intel
Ollama version
0.1.33
The text was updated successfully, but these errors were encountered: