Skip to content

llamafile v0.8.1

Compare
Choose a tag to compare
@jart jart released this 26 Apr 20:33
· 45 commits to main since this release
2095d50
  • Support for Phi-3 Mini 4k has been introduced
  • A bug causing GPU module crashes on some systems has been resolved
  • Support for Command-R Plus has now been vetted with proper 64-bit indexing
  • We now support more AMD GPU architectures thanks to better detection of offload archs (#368)
  • We now ship prebuilt NVIDIA and ROCm modules for both Windows and Linux users. They link tinyBLAS which is a libre math library that only depends on the graphics driver being installed. Since it's slower, llamafile will automatically build a native module for your system if the CUDA or ROCm SDKs are installed. You can control this behavior using --nocompile or --recompile. Yes, Our LLavA llamafile still manages to squeak under the Windows 4GB file size limit!
  • An assertion error has been fixed that happened when using llamafile-quantize to create K quants from an F32 GGUF file
  • A new llamafile-tokenize command line tool has been introduced. For example, if you want to count how many "tokens" are in a text file, you can say cat file.txt | llamafile-tokenize -m model.llamafile | wc -l since it prints each token on a single line.