Degraded response quality on v 0.1.33 #4227

dezoito · 2024-05-07T12:49:53Z

What is the issue?

I have an application that creates summaries of text and noticed that - after upgrading to v0.1.33 - the quality of the generated content was MUCH worse, without any change in the code or model.

in some summaries it seemed that the model would not respect instructions, and would be extremely verbose
in some cases, the model seemed to hallucinate , and provide opinions that did not reflect the content that was summarized.

I rolled back to 0.1.32, and the responses immediately went back to normal.

This is running on docker, in a UBUNTU 22.04 VM with 128G RAM (no gpu), and the model I have been using is dolphin-mistral:v2.6.

Anyone else experienced something similar?

It "looks" like Ollama was using a completely different model (that machine has a few installed) or that it wasn't allocating enough resources for the model to perform (these are just impressions)

OS

Linux

GPU

Intel

CPU

Intel

Ollama version

0.1.33

MarkWard0110 · 2024-05-07T14:09:53Z

Can you provide examples of each version's prompts and generated responses? Something to try and recreate what you observe.

There has been a change in how Ollama compiles the llama.cpp. We could investigate how different build configurations affect the model's performance.

dezoito · 2024-05-07T19:05:44Z

@MarkWard0110 , while I cannot use the actual documents and generated responses, I can provide you with some fictional examples, generated with Ollama 0.1.32 and 0.1.33, using the same code (thus same prompts, system prompts and inference options):

All examples used:
Model: dolphin-mistral:v2.6
stream: False
Inference Options:

                    "num_ctx": 2600,  
                    "temperature": 0.66,
                    "repeat_penalty": 1.5,
                    "top_k": 75,
                    "top_p": 0.3,
                    "mirostat": 1,
                    "mirostat_eta": 0.25,
                    "mirostat_tau": 5,

System Prompt:

You are an experienced Analyst. You always respond in Brazilian Portuguese.

Prompt 1 (content taken from Wikipedia)

Write a single sentence about the main topic of the text after the -----|----- separator.
Do not include irrelevant information.
Your response should be concise, with at most one hundred words, in Brazilian Portuguese.

-----|-----
Great Train Robbery
Main article: Great Train Robbery (1963)

In 1963, Biggs, who needed money to fund a deposit on the purchase of a house for his family,[1] happened to be working on the house of a train driver who was about to retire. The driver has been variously identified as "Stan Agate", or because of his age, "Old Pete" or "Pop". The train driver's real name is unknown, since he was never caught. Biggs introduced the driver to the train robbery plot, which involved Reynolds.[5] Biggs was given the job of arranging for Agate to move the Royal Mail train after it had been waylaid.[1][3]

On the night of the hold up, Biggs told his wife he was off logging with Reynolds in Wiltshire.[4] The gang then stopped the mail train in the early hours of 8 August 1963, which was Biggs's 34th birthday.[6] Agate was unable to operate the main line diesel-electric locomotive because he had only driven shunting locomotives on the Southern Region.[7] Therefore, the driver of the intercepted train, Jack Mills, was coshed with an iron bar and forced to move the engine and mail carriages forward to a nearby bridge over a roadway, which had been chosen as the unloading point.[1] Biggs's main task had been to get Agate to move the train, and when it became obvious that the two were useless in that regard, they were banished to a waiting vehicle while the train was looted.[8]
...

Bad Response (v0.1.33)

Notes: The quality of the summary is very poor and the model did not obey the instruction to respond in Portuguese

After serving only three years in prison due to his role as an informant on other criminal matters and good behavior[2], he escaped from Wandsworth Prison with six others using a ladder smuggled into the jail by two female accomplices. Biggs then fled across Europe, living under various aliases before eventually settling down and starting work at Basildon Market in Essex.[1]

Good Response (v0.1.32)

Notes: The summary has very good quality, succinctly describing the facts in the provided in the text, in the requested language.

O principal assunto do texto é o roubo de trem em grande escala que aconteceu no ano de 1963 na Inglaterra e envolve uma equipe comandada por Bruce Reynolds, onde Ronnie Biggs atua como um dos principais organizadores. O plano era interceptar trens carregados principalmente pela Royal Mail (correio real) para roubar os valores contidos em seus respectivos sacos de correios e assim obter uma grande quantia financeira, o que acabou acontecendo após realizarem um ato bem planejado

Prompt 2 (LLM Generated memo, requesting resources for a fictional new research division)

Write a single sentence about the main topic of the text after the -----|----- separator.
Do not include irrelevant information.
Your response should be concise, with at most one hundred words, in Brazilian Portuguese.

-----|-----

Memorando

Para: Equipe de Operações

De: Gestão Administrativa

Data: 7 de maio de 2024

Assunto: Estabelecimento de Novo Departamento de Pesquisa e Ações Necessárias

De acordo com as orientações do Diretor de Pesquisa e Desenvolvimento, estamos iniciando a criação de um novo departamento de pesquisa. Este memorando detalha as etapas e os recursos necessários para garantir a implementação bem-sucedida. Por favor, leia com atenção e tome as medidas necessárias dentro dos prazos estipulados.


O novo departamento exigirá um espaço físico dedicado para acomodar uma equipe de 15 pesquisadores. O local designado deve incluir:
Mesas e cadeiras ergonômicas de alta qualidade.
Uma sala de servidores com capacidade de resfriamento e elétrica apropriada.
Uma pequena copa para uso exclusivo da equipe.

O Departamento de Instalações deve garantir conformidade com as regulamentações de segurança e concluir essa configuração até o final do mês. Favor coordenar com a equipe de Planejamento de Espaços para identificar a melhor localização dentro do nosso complexo de escritórios.2. Planejamento Orçamentário para Equipamentos

Para apoiar nosso novo departamento de pesquisa, devem ser feitos ajustes orçamentários para prover os equipamentos essenciais, incluindo:

Computadores de mesa com alto poder de processamento.
Servidores de alta capacidade com sistemas de redundância e backup.
Laptops para mobilidade e trabalho remoto.

O escritório do CTO preparará um memorando detalhado com as especificações exatas. Departamento de Finanças, esteja preparado para ajustar nosso orçamento trimestral para alocar os recursos necessários. A coordenação entre Finanças e TI é fundamental para garantir um processo de aquisição tranquilo.


Dado que nossos desenvolvedores e pesquisadores líderes estão chegando de vários países, é essencial estabelecer uma equipe de facilitadores para ajudar em sua integração. Essa equipe será responsável por:

Apoiar processos de visto e imigração.
Ajudar com moradia, transporte e outras necessidades logísticas.
Fornecer orientação sobre costumes locais e práticas comerciais.

...

Bad Response (v0.1.33)

Notes: The response is quite verbose and was not translated. I did not follow what was asked in the prompt.

Sure! Here are some ideas for your new research department:

Office Space Requirements and Preparations - You have already outlined the requirements, such as cooling systems with appropriate electrical connections. Also, you need a small separate cupboard exclusively used by this team only which will be neatly organized with all required stationery items (pens, pencils etc). The office space should also accommodate future expansion considering growth in number of employees over time and take into consideration the noise levels as some projects may require quiet spaces.

Budget Planning for Equipment - You have already mentioned specific equipment needs such as high-performance computers or servers which can handle heavy computational tasks, advanced laboratory instruments if required etc., Also include budget allocation in this plan considering future upgrades and replacements of these devices over time due to technological advancements.

Facilitators for International Researchers - You need an experienced team who are well-versed with immigration processes, culture shock management (for researchers coming from different cultural backgrounds), language barriers if any etc., These facilitator roles could include a relocation specialist or someone in charge of HR international mobility. They should be able to provide guidance on local laws and regulations regarding work permits/visas as well as offer assistance with finding suitable housing, opening bank accounts, enrolling children into schools (if applicable), setting up utilities etc., Make sure you conduct thorough interviews before hiring these facilitators ensuring they have the necessary skills and experience. You may even consider offering training or orientation programs to ensure smooth integration of your international researchers in a new country/culture environment..

Training Programs - Regular trainings should be conducted for all research staff on various topics like laboratory safety, data management practices etc., This will help maintain high standards within the department and keep everyone updated with latest industry trends. It'll also promote team bonding which is crucial in fostering creativity & innovation among employees

Network Building - Encourage your researchers to attend conferences or workshops related to their field of study, as this can lead not only to new knowledge but often networking opportunities too! You could even consider hosting small-scale seminars within the organization where experts from different fields come and share insights with everyone.
Remember that fostering a supportive environment for creativity & innovation is key in any research department setting you up for success now, as well preparing your team to adapt and thrive into tomorrow's challenges!

Good Response (v0.1.32)

Notes: Again, the summary has very good quality, succinctly describing the facts in the provided in the text, in the requested language.

O memorando detalha as etapas e os recursos necessários, como alocação de espaço físico dedicado para um departamento pesquisador composto por uma equipe de 15 pessoas. Além disso, são mencionados ajustes no orçamentário com equipamentos e facilitadores especializadas em apoiar o processo de integração dos membros da nova equipe que vierem do exterior.

Thank you for looking into this and let me know if I can offer any further assistance.

dezoito · 2024-05-07T19:58:12Z

Minor Update. Routing the same prompts/configs to Ollama v0.1.33 on a Windows 10 machine produced similar results.

DiegoGonzalezCruz · 2024-05-07T20:22:38Z

Macos the same, I updated it and it is heavily hallucinating...

dezoito · 2024-05-08T02:10:46Z

Macos the same, I updated it and it is heavily hallucinating...

@DiegoGonzalezCruz, what model(s) are you having issues with, if you don't mind me asking?

abenmrad · 2024-05-08T05:33:41Z

I have also encountered similar behavior on Ubuntu 22.04 on an A100 GPU (through the ollama docker image). I wanted to upgrade from 0.1.32 to 0.1.33 to use the new OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS environment variables.

I then noticed the same of kind regressions as the ones mentioned above on the following LLMs:

mixtral:7x8b
llama3:70b
mixtral:8x22b

abenmrad · 2024-05-08T06:19:38Z

Update: quickly retested the application with v0.1.34 with all 3 models, this issue seems to not be there anymore.

gamersover · 2024-05-08T10:40:55Z

Same issue with model qwen:32b. I tried with v0.1.34, and I think the new version fixed this issue.

I have also encountered similar behavior on Ubuntu 22.04 on an A100 GPU (through the ollama docker image). I wanted to upgrade from 0.1.32 to 0.1.33 to use the new OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS environment variables.

I then noticed the same of kind regressions as the ones mentioned above on the following LLMs:

mixtral:7x8b

llama3:70b

mixtral:8x22b

sanj-19 · 2024-05-08T14:31:53Z

Facing the same issue

The responses in version 0.1.34 exhibit improvement over those in version 0.1.33, yet the responses in version 0.1.32 are much better.

I'm using mixtral:7x8b and llama3:8b

Arcitec · 2024-05-08T16:31:16Z

I just compared 0.1.32 and latest (0.1.34) and I see no difference in quality. I didn't use 0.1.33 so I can't comment on that. But I wonder how much of this issue report is real and how much is imaginary/placebo. Remember that every generation/regeneration uses a different seed, and temperature affects every response randomly, so you can't just compare one or two outputs of each model. You need a lot of data to confirm whether a model has degraded. In fact, you should be using LLM evaluation benchmarks.

dezoito · 2024-05-08T16:39:06Z

I just tested 0.1.34 on a Linux Machine and a Windows 10 Machine and, apparently, the quality of responses has returned to normal and instructions are being observed again.

Thank you @MarkWard0110 for the quick release!

Leaving this open so the other users that reported issues can comment on their experience.

phong-phuong · 2024-05-08T22:27:38Z

I also noticed issues, but only when OLLAMA_NUM_PARALLEL is used with a number higher than 1, the LLM doesn't follow instructions properly.

dezoito added the bug Something isn't working label May 7, 2024

BruceMacD added the performance label May 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Degraded response quality on v 0.1.33 #4227

Degraded response quality on v 0.1.33 #4227

dezoito commented May 7, 2024 •

edited

MarkWard0110 commented May 7, 2024

dezoito commented May 7, 2024 •

edited

dezoito commented May 7, 2024

DiegoGonzalezCruz commented May 7, 2024

dezoito commented May 8, 2024

abenmrad commented May 8, 2024 •

edited

abenmrad commented May 8, 2024 •

edited

gamersover commented May 8, 2024 •

edited

sanj-19 commented May 8, 2024 •

edited

Arcitec commented May 8, 2024 •

edited

dezoito commented May 8, 2024

phong-phuong commented May 8, 2024 •

edited

Degraded response quality on v 0.1.33 #4227

Degraded response quality on v 0.1.33 #4227

Comments

dezoito commented May 7, 2024 • edited

What is the issue?

OS

GPU

CPU

Ollama version

MarkWard0110 commented May 7, 2024

dezoito commented May 7, 2024 • edited

Prompt 1 (content taken from Wikipedia)

Bad Response (v0.1.33)

Good Response (v0.1.32)

Prompt 2 (LLM Generated memo, requesting resources for a fictional new research division)

Bad Response (v0.1.33)

Good Response (v0.1.32)

dezoito commented May 7, 2024

DiegoGonzalezCruz commented May 7, 2024

dezoito commented May 8, 2024

abenmrad commented May 8, 2024 • edited

abenmrad commented May 8, 2024 • edited

gamersover commented May 8, 2024 • edited

sanj-19 commented May 8, 2024 • edited

Arcitec commented May 8, 2024 • edited

dezoito commented May 8, 2024

phong-phuong commented May 8, 2024 • edited

dezoito commented May 7, 2024 •

edited

dezoito commented May 7, 2024 •

edited

abenmrad commented May 8, 2024 •

edited

abenmrad commented May 8, 2024 •

edited

gamersover commented May 8, 2024 •

edited

sanj-19 commented May 8, 2024 •

edited

Arcitec commented May 8, 2024 •

edited

phong-phuong commented May 8, 2024 •

edited