Releases: ollama/ollama
Releases · ollama/ollama
v0.1.11
New Models
- Orca 2: A fine-tuned version of Meta's Llama 2 model, designed to excel particularly in reasoning.
- DeepSeek Coder: A capable coding model trained from scratch. Available in 1.3B, 6.7B and 33B parameter counts.
- Alfred: A robust conversational model designed to be used for both chat and instruct use cases.
What's Changed
- Improved progress bar design
- Fixed issue where
ollama create
would error withinvalid cross-device link
- Fixed issue where
ollama run
Ollama would exit with an error on macOS Big Sur and Monterey q5_0
andq5_1
models will now use GPU- Fixed several
max retries exceeded
errors when runningollama pull
orollama push
- Fixed issue where
ollama create
would result in a "file not found" errorFROM
referred to local file - Fixed issue where resizing the terminal while running
ollama pull
would cause repeated progress bar messages - Minor performance improvements on Intel Macs
- Improved error messages on Linux when using Nvidia GPUs
Full Changelog: v0.1.10...v0.1.11
v0.1.10
New models
- OpenChat: An open-source chat model trained on a wide variety of data, surpassing ChatGPT on various benchmarks.
- Neural-chat: New chat model by Intel
- Goliath: A large chat model created by combining two fine-tuned versions of Llama 2 70B
What's Changed
- JSON mode can now be used with
ollama run
:- Pass
--format json
flag or - Use
/set format json
to change the current chat session to use JSON mode
- Pass
- Prompts can now be passed in via standard input to
ollama run
. For example:head -30 README.md | ollama run codellama "how do I install Ollama on Linux?"
ollama create
now works withOLLAMA_HOST
to build models using Ollama running on a remote machine- Fixed crashes on Intel Macs
- Fixed issue where
ollama pull
progress would reverse when re-trying a failed connection - Fixed issue where
ollama show --modelfile
would show an incorrectFROM
command - Fixed issue where word wrap wouldn't work when piping in data to
ollama run
via standard input - Fix permission denied issues when running
ollama create
on Linux - Added FAQ entry for proxy support on Linux
- Fixed installer error on Debian 12
- Fixed issue where
ollama push
would result in a 405 error ollama push
will now return a better error when trying to push to a namespace the current user does not have access to
New Contributors
- @dhiltgen made their first contribution in #1075
- @dansreis made their first contribution in #1055
- @breitburg made their first contribution in #1106
- @enricoros made their first contribution in #1078
- @huynle made their first contribution in #1115
- @bnodnarb made their first contribution in #1098
- @danemadsen made their first contribution in #1120
- @pieroit made their first contribution in #1124
- @yanndegat made their first contribution in #1151
Full Changelog: v0.1.9...v0.1.10
v0.1.9
New models
- Yi: a high-performing, bilingual model supporting both English and Chinese.
What's Changed
- JSON mode: instruct models to always return valid JSON when calling
/api/generate
by setting theformat
parameter tojson
- Raw mode: bypass any templating done by Ollama by passing
{"raw": true}
to/api/generate
- Better error descriptions when downloading and uploading models with
ollama pull
andollama push
- Fixed issue where Linux installer would encounter an error when running as the
root
user - Improved progress bar design when running
ollama pull
andollama push
- Fixed issue where running on a machine with less than 2GB of VRAM would be slow
New Contributors
- @pepperoni21 made their first contribution in #995
- @lgrammel made their first contribution in #1020
- @ej52 made their first contribution in #999
- @David-Kunz made their first contribution in #996
- @tjbck made their first contribution in #943
- @omagdy7 made their first contribution in #1029
- @upchui made their first contribution in #1034
- @kevinhermawan made their first contribution in #1043
- @amithkoujalgi made their first contribution in #1044
- @mpldr made their first contribution in #1042
- @aashish2057 made their first contribution in #992
- @nickanderson made their first contribution in #1062
Full Changelog: v0.1.8...v0.1.9
v0.1.8
New Models
- CodeBooga: A high-performing code instruct model created by merging two existing code models.
- Dolphin 2.2 Mistral: An instruct-tuned model based on Mistral. Version 2.2 is fine-tuned for improved conversation and empathy.
- MistralLite: MistralLite is a fine-tuned model based on Mistral with enhanced capabilities of processing long contexts.
- Yarn Mistral an extension of Mistral to support a context window of up to 128 tokens
- Yarn Llama 2 an extension of Llama 2 to support a context window of up to 128 tokens
What's Changed
- Ollama will now honour large context sizes on models such as
codellama
andmistrallite
- Fixed issue where repeated characters would be output on long contexts
ollama push
is now much faster. 7B models will push up to ~100MB/s and large models (70B+) up to 1GB/s if network speeds permit
New Contributors
- @dloss made their first contribution in #948
- @noahgitsham made their first contribution in #983
Full Changelog: v0.1.7...v0.1.8
v0.1.7
What's Changed
- Fixed an issue when running
ollama run
where certain key combinations such as Ctrl+Space would lead to an unresponsive prompt - Fixed issue in
ollama run
where retrieving the previous prompt from history would require two up arrow key presses instead of one - Exiting
ollama run
with Ctrl+D will now put cursor on the next line
Full Changelog: v0.1.6...v0.1.7
v0.1.6
New models
- Dolphin 2.1 Mistral: an instruct-tuned model based on Mistral and trained on a dataset filtered to remove alignment and bias.
- Zephyr Beta: this is the second model in the series based on Mistral, and has strong performance that compares to and even exceeds Llama 2 70b in several categories. It’s trained on a distilled dataset, improving grammar and yielding even better chat results.
What's Changed
- Pasting multi-line strings in
ollama run
is now possible - Fixed various issues when writing prompts in
ollama run
- The library models have been refreshed and revamped including
llama2
,codellama
, and more:- All
chat
orinstruct
models now support setting thesystem
parameter, orSYSTEM
command in theModelfile
- Parameters (
num_ctx
, etc) have been updated for library models - Slight performance improvements for all models
- All
- Model storage can now be configured with
OLLAMA_MODELS
. See the FAQ for more info on how to configure this. OLLAMA_HOST
will now default to port443
whenhttps://
is specified, and port80
whenhttp://
is specified- Fixed trailing slashes causing an error when using
OLLAMA_HOST
- Fixed issue where
ollama pull
would retry multiple times when out of space - Fixed various
out of memory
issues when using Nvidia GPUs - Fixed performance issue previously introduced on AMD CPUs
New Contributors
Full Changelog: v0.1.5...v0.1.6
v0.1.5
What's Changed
- Fix an issue where an error would occur when running
falcon
orstarcoder
models
Full Changelog: v0.1.4...v0.1.5
v0.1.4
New models
- OpenHermes 2 Mistral: a new fine-tuned model based on Mistral, trained on open datasets totalling over 900,000 instructions. This model has strong multi-turn chat skills, surpassing previous Hermes 13B models and even matching 70B models on some benchmarks.
What's Changed
- Faster model switching: models will now stay loaded between requests when using different parameters (e.g.
temperature
) or system prompts starcoder
,sqlcoder
andfalcon
models now have unicode support. Note: they will need to be re-pulled (e.g.ollama pull starcoder
)- New documentation guide on importing existing models to Ollama (GGUF, PyTorch, etc)
ollama serve
will now print the current version of Ollama on startollama run
will now show more descriptive errors when encountering runtime issues (such as insufficient memory)- Fixed an issue where Ollama on Linux would use CPU instead of using both the CPU and GPU for GPUs with less memory
- Fixed architecture check in Linux install script
- Fixed issue where leading whitespaces would be returned in responses
- Fixed issue where
ollama show
would show an emptySYSTEM
prompt (instead of omitting it) - Fixed issue with the
/api/tags
endpoint would returnnull
instead of[]
if no models were found - Fixed an issue where
ollama show
wouldn't work when connecting remotely by usingOLLAMA_HOST
- Fixed issue where GPU/Metal would be used on macOS even with
num_gpu
set to0
- Fixed issue where certain characters would be escaped in responses
- Fixed
ollama serve
logs to report the proper amount of GPU memory (VRAM) being used
Note: the EMBED
keyword in Modelfile
is being revisited until a future version of Ollama. Join the discussion on how we can make it better.
New Contributors
- @vieux made their first contribution in #810
- @s-kostyaev made their first contribution in #801
- @ggozad made their first contribution in #794
- @awaescher made their first contribution in #811
- @deichbewohner made their first contribution in #799
Full Changelog: v0.1.3...v0.1.4
v0.1.3
What's Changed
- Improved various API error messages to be easier to read
- Improved GPU allocation for older GPUs to fix "out of memory" errors
- Fixed issue where setting
num_gpu
to0
would result in an error - Ollama for macOS will now always update to the latest version, even if earlier updates had also been downloaded beforehand
Full Changelog: v0.1.2...v0.1.3
v0.1.2
New Models
- Zephyr A fine-tuned 7B version of mistral that was trained on a mix of publicly available, synthetic datasets and performs as well as Llama 2 70B in many benchmarks
- Mistral OpenOrca a 7 billion parameter model fine-tuned on top of the Mistral 7B model using the OpenOrca dataset
Examples
Ollama's examples have been updated with some new examples:
- Ask the mentors: a TypesScript, multi-user conversation app
- TypeScript LangChain: a simple example of using Ollama with LangChainJS and TypeScript.
What's Changed
- Download speeds for
ollama pull
have been significantly improved, from 60MB/s to over 1.5GB/s (25x faster) on fast network connections - The API now supports non-streaming responses. Set the
stream
parameter tofalse
and endpoints will return data in one single response:curl -X POST http://localhost:11434/api/generate -d '{ "model": "llama2", "prompt": "Why is the sky blue?", "stream": false }'
- Ollama can now be used with http proxies (using
HTTP_PROXY=http://<proxy>
) and https proxies (usingHTTPS_PROXY=https://<proxy>
) - Fixed
token too long
error when generating a response q8_0
,q5_0
,q5_1
, andf32
models will now use GPU on Linux- Revise help text in
ollama run
to be easier to read - Rename runner subprocess to
ollama-runner
ollama create
will now show feedback when reading model metadata- Fix
not found error
showing when runningollama pull
- Improved video memory allocation on Linux to fix errors when using Nvidia GPUs
New Contributors
Full Changelog: v0.1.1...v0.1.2