Releases · ollama/ollama

15 May 00:28

github-actions

v0.1.38

d1692fd

v0.1.38 Latest

Latest

New Models

Falcon 2: A new 11B parameters causal decoder-only model built by TII and trained over 5T tokens.
Yi 1.5: A new high-performing version of Yi, now licensed as Apache 2.0. Available in 6B, 9B and 34B sizes.

What's Changed

`ollama ps`

A new command is now available: ollama ps. This command displays currently loaded models, their memory footprint, and the processors used (GPU or CPU):

% ollama ps
NAME             	ID          	SIZE  	PROCESSOR      	UNTIL              
mixtral:latest   	7708c059a8bb	28 GB 	47%/53% CPU/GPU	Forever           	
llama3:latest    	a6990ed6be41	5.5 GB	100% GPU       	4 minutes from now	
all-minilm:latest	1b226e2802db	585 MB	100% GPU       	4 minutes from now

`/clear`

To clear the chat history for a session when running ollama run, use /clear:

>>> /clear
Cleared session context

Fixed issue where switching loaded models on Windows would take several seconds
Running /save will no longer abort the chat session if an incorrect name is provided
The /api/tags API endpoint will now correctly return an empty list [] instead of null if no models are provided

New Contributors

@fangtaosong made their first contribution in #4387
@machimachida made their first contribution in #4424

Full Changelog: v0.1.37...v0.1.38

Contributors

machimachida and fangtaosong

Assets 10

12 May 01:59

github-actions

v0.1.37

41ba301

v0.1.37

What's Changed

Fixed issue where models with uppercase characters in the name would not show with ollama list
Fixed usage string for ollama create
Fix finish_reason being "" instead of null in the Open-AI compatible chat API.

New Contributors

@todashuta made their first contribution in #4362

Full Changelog: v0.1.36...v0.1.37

Contributors

todashuta

Assets 10

11 May 06:37

github-actions

v0.1.36

92ca2cc

v0.1.36

What's Changed

Fixed exit status 0xc0000005 error with AMD graphics cards on Windows
Fixed rare out of memory errors when loading a model to run with CPU

Full Changelog: v0.1.35...v0.1.36

Assets 10

10 May 15:15

github-actions

v0.1.35

86f9b58

v0.1.35

New models

Llama 3 ChatQA: A model from NVIDIA based on Llama 3 that excels at conversational question answering (QA) and retrieval-augmented generation (RAG).

What's Changed

Quantization: ollama create can now quantize models when importing them using the --quantize or -q flag:

ollama create -f Modelfile --quantize q4_0 mymodel

Note

--quantize works when importing float16 or float32 models:

From a binary GGUF files (e.g. FROM ./model.gguf)
From a library model (e.g. FROM llama3:8b-instruct-fp16)

Fixed issue where inference subprocesses wouldn't be cleaned up on shutdown.
Fixed a series out of memory errors when loading models on multi-GPU systems
Ctrl+J characters will now properly add newlines in ollama run
Fixed issues when running ollama show for vision models
OPTIONS requests to the Ollama API will no longer result in errors
Fixed issue where partially downloaded files wouldn't be cleaned up
Added a new done_reason field in responses describing why generation stopped responding
Ollama will now more accurately estimate how much memory is available on multi-GPU systems especially when running different models one after another

New Contributors

@fmaclen made their first contribution in #3884
@Renset made their first contribution in #3881
@glumia made their first contribution in #3043
@boessu made their first contribution in #4236
@gaardhus made their first contribution in #2307
@svilupp made their first contribution in #2192
@WolfTheDeveloper made their first contribution in #4300

Full Changelog: v0.1.34...v0.1.35

Contributors

Renset, fmaclen, and 5 other contributors

Assets 10

07 May 05:13

github-actions

v0.1.34

adeb40e

v0.1.34

New models

Llava Llama 3: A new high-performing LLaVA model fine-tuned from Llama 3 Instruct.
Llava Phi 3: A new small LLaVA model fine-tuned from Phi 3.
StarCoder2 15B Instruct: A new instruct fine-tune of the StarCoder2 model
CodeGemma 1.1: A new release of the CodeGemma model.
StableLM2 12B: A new 12B version of the StableLM 2 model from Stability AI
Moondream 2: Moondream 2's runtime parameters have been improved for better responses

What's Changed

Fixed issues with LLaVa models where they would respond incorrectly after the first request
Fixed out of memory errors when running large models such as Llama 3 70B
Fixed various issues with Nvidia GPU discovery on Linux and Windows
Fixed a series of Modelfile errors when running ollama create
Fixed no slots available error that occurred when cancelling a request and then sending follow up requests
Improved AMD GPU detection on Fedora
Improved reliability when using the experimental OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED flags
ollama serve will now shut down quickly, even if a model is loading

New Contributors

@drnic made their first contribution in #4116
@bernardo-bruning made their first contribution in #4111
@Drlordbasil made their first contribution in #4174
@Saif-Shines made their first contribution in #4119
@HydenLiu made their first contribution in #4194
@jl-codes made their first contribution in #3621
@Nurgo made their first contribution in #3473
@adrienbrault made their first contribution in #3129
@Darinochka made their first contribution in #3945

Full Changelog: v0.1.33...v0.1.34

Contributors

drnic, adrienbrault, and 7 other contributors

Assets 10

28 Apr 17:51

github-actions

v0.1.33

9164b01

v0.1.33

New models:

Llama 3: a new model by Meta, and the most capable openly available LLM to date
Phi 3 Mini: a new 3.8B parameters, lightweight, state-of-the-art open model by Microsoft.
Moondream moondream is a small vision language model designed to run efficiently on edge devices.
Llama 3 Gradient 1048K: A Llama 3 fine-tune by Gradient to support up to a 1M token context window.
Dolphin Llama 3: The uncensored Dolphin model, trained by Eric Hartford and based on Llama 3 with a variety of instruction, conversational, and coding skills.
Qwen 110B: The first Qwen model over 100B parameters in size with outstanding performance in evaluations

What's Changed

Fixed issues where the model would not terminate, causing the API to hang.
Fixed a series of out of memory errors on Apple Silicon Macs
Fixed out of memory errors when running Mixtral architecture models

Experimental concurrency features

New concurrency features are coming soon to Ollama. They are available

OLLAMA_NUM_PARALLEL: Handle multiple requests simultaneously for a single model
OLLAMA_MAX_LOADED_MODELS: Load multiple models simultaneously

To enable these features, set the environment variables for ollama serve. For more info see this guide:

OLLAMA_NUM_PARALLEL=4 OLLAMA_MAX_LOADED_MODELS=4 ollama serve

New Contributors

@hmartinez82 made their first contribution in #3972
@Cephra made their first contribution in #4037
@arpitjain099 made their first contribution in #4007
@MarkWard0110 made their first contribution in #4031
@alwqx made their first contribution in #4073
@sidxt made their first contribution in #3705
@ChengenH made their first contribution in #3789
@secondtruth made their first contribution in #3503
@reid41 made their first contribution in #3612
@ericcurtin made their first contribution in #3626
@JT2M0L3Y made their first contribution in #3633
@datvodinh made their first contribution in #3655
@MapleEve made their first contribution in #3817
@swuecho made their first contribution in #3810
@brycereitano made their first contribution in #3895
@bsdnet made their first contribution in #3889
@fyxtro made their first contribution in #3855
@natalyjazzviolin made their first contribution in #3962

Full Changelog: v0.1.32...v0.1.33

Contributors

secondtruth, swuecho, and 16 other contributors

Assets 10

10 Apr 23:01

github-actions

v0.1.32

fb9580d

v0.1.32

New models

WizardLM 2: State of the art large language model from Microsoft AI with improved performance on complex chat, multilingual, reasoning and agent use cases.
- wizardlm2:8x22b: large 8x22B model based on Mixtral 8x22B
- wizardlm2:7b: fast, high-performing model based on Mistral 7B
Snowflake Arctic Embed: A suite of text embedding models by Snowflake, optimized for performance.
Command R+: a powerful, scalable large language model purpose-built for RAG use cases
DBRX: A large 132B open, general-purpose LLM created by Databricks.
Mixtral 8x22B: the new leading Mixture of Experts (MoE) base model by Mistral AI.

What's Changed

Ollama will now better utilize available VRAM, leading to less out-of-memory errors, as well as better GPU utilization
When running larger models that don't fit into VRAM on macOS, Ollama will now split the model between GPU and CPU to maximize performance.
Fixed several issues where Ollama would hang upon encountering an error
Fix issue where using quotes in OLLAMA_ORIGINS would cause an error

New Contributors

@sugarforever made their first contribution in #3400
@yaroslavyaroslav made their first contribution in #3378
@Nagi-ovo made their first contribution in #3423
@ParisNeo made their first contribution in #3436
@philippgille made their first contribution in #3437
@cesto93 made their first contribution in #3461
@ThomasVitale made their first contribution in #3515
@writinwaters made their first contribution in #3539
@alexmavr made their first contribution in #3555

Full Changelog: v0.1.31...v0.1.32

Contributors

philippgille, sugarforever, and 7 other contributors

Assets 10

05 Apr 16:09

github-actions

v0.1.31

dc011d1

v0.1.31

Ollama supports embedding models. Bring your existing documents or other data, and combine it with text prompts to build RAG (retrieval augmented generation) apps using the Ollama REST API, Python or Javascript libraries.

New models

Qwen 1.5 32B: A new 32B multilingual model competitive with larger models such as Mixtral
StarlingLM Beta: A high ranking 7B model on popular benchmarks that includes a permissive Apache 2.0 license.
DolphinCoder StarCoder 7B: A 7B uncensored variant of the Dolphin model family that excels at coding, based on StarCoder2 7B.
StableLM 1.6 Chat: A new version of StableLM 1.6 tuned for instruction

What's new

Fixed issue where Ollama would hang when using certain unicode characters in the prompt such as emojis

Full Changelog: v0.1.30...v0.1.31

Assets 10

26 Mar 18:19

github-actions

v0.1.30

756c257

v0.1.30

Ollama now supports Cohere's Command R model

New models

Command R: a Large Language Model optimized for conversational interaction and long context tasks.
mxbai-embed-large: A new state-of-the-art large embedding model

What's Changed

Fixed various issues with ollama run on Windows
- History now will work when pressing up and down arrow keys
- Right and left arrow keys will now move the cursor appropriately
- Pasting multi-line strings will now work on Windows
Fixed issue where mounting or sharing files between Linux and Windows (e.g. via WSL or Docker) would cause errors due to having : in the filename.
Improved support for AMD MI300 and MI300X Accelerators
Improved cleanup of temporary files resulting in better space utilization

Important change

For filesystem compatibility, Ollama has changed model data filenames to use - instead of :. This change will be applied automatically. If downgrading to 0.1.29 or lower from 0.1.30 (on Linux or macOS only) run:

find ~/.ollama/models/blobs -type f -exec bash -c 'mv "$0" "${0//-/:}"' {} \;

New Contributors

@alitrack made their first contribution in #3111
@drazdra made their first contribution in #3338
@rapidarchitect made their first contribution in #3288
@yusufcanb made their first contribution in #3274
@jikkuatwork made their first contribution in #3178
@timothycarambat made their first contribution in #3145
@fly2tomato made their first contribution in #2946
@enoch1118 made their first contribution in #2927
@danny-avila made their first contribution in #2918
@mmo80 made their first contribution in #2881
@anaisbetts made their first contribution in #2428
@marco-souza made their first contribution in #1905
@guchenhe made their first contribution in #1944
@herval made their first contribution in #1873
@Npahlfer made their first contribution in #1623
@remy415 made their first contribution in #2279

Full Changelog: v0.1.29...v0.1.30

Contributors

anaisbetts, herval, and 14 other contributors

Assets 10

10 Mar 02:24

jmorganca

v0.1.29

e87c780

v0.1.29

AMD Preview

Ollama now supports AMD graphics cards in preview on Windows and Linux. All the features are now accelerated by AMD graphics cards, and support is included by default in Ollama for Linux, Windows and Docker.

Supported cards and accelerators

Family	Supported cards and accelerators
AMD Radeon RX	`7900 XTX` `7900 XT` `7900 GRE` `7800 XT` `7700 XT` `7600 XT` `7600` `6950 XT` `6900 XTX` `6900XT` `6800 XT` `6800` `Vega 64` `Vega 56`
AMD Radeon PRO	`W7900` `W7800` `W7700` `W7600` `W7500` `W6900X` `W6800X Duo` `W6800X` `W6800` `V620` `V420` `V340` `V320` `Vega II Duo` `Vega II` `VII` `SSG`
AMD Instinct	`MI300X` `MI300A` `MI300` `MI250X` `MI250` `MI210` `MI200` `MI100` `MI60` `MI50`

What's Changed

ollama <command> -h will now show documentation for supported environment variables
Fixed issue where generating embeddings with nomic-embed-text, all-minilm or other embedding models would hang on Linux
Experimental support for importing Safetensors models using the FROM <directory with safetensors model> command in the Modelfile
Fixed issues where Ollama would hang when using JSON mode.
Fixed issue where ollama run would error when piping output to tee and other tools
Fixed an issue where memory would not be released when running vision models
Ollama will no longer show an error message when piping to stdin on Windows

New Contributors

@tgraupmann made their first contribution in #2582
@andersrex made their first contribution in #2909
@leonid20000 made their first contribution in #2440
@hishope made their first contribution in #2973
@mrdjohnson made their first contribution in #2759
@mofanke made their first contribution in #3077
@racerole made their first contribution in #3073
@Chris-AS1 made their first contribution in #3094

Full Changelog: v0.1.28...v0.1.29

Contributors

tgraupmann, andersrex, and 6 other contributors

Assets 8

Releases: ollama/ollama

v0.1.38

New Models

What's Changed

ollama ps

/clear

New Contributors

Contributors

v0.1.37

What's Changed

New Contributors

Contributors

v0.1.36

What's Changed

v0.1.35

New models

What's Changed

New Contributors

Contributors

v0.1.34

New models

What's Changed

New Contributors

Contributors

v0.1.33

New models:

What's Changed

Experimental concurrency features

New Contributors

Contributors

v0.1.32

New models

What's Changed

New Contributors

Contributors

v0.1.31

New models

What's new

v0.1.30

New models

What's Changed

New Contributors

Contributors

v0.1.29

AMD Preview

Supported cards and accelerators

What's Changed

New Contributors

Contributors

`ollama ps`

`/clear`