A Python script designed to streamline the process of quantizing models to exllamav2 format
-
Updated
May 17, 2024 - Python
A Python script designed to streamline the process of quantizing models to exllamav2 format
Faster Whisper transcription with CTranslate2
your go-to tool for easily creating quantized versions of Hugging Face models in the GGUF format.
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
Unify Efficient Fine-Tuning of 100+ LLMs
Implementation of MedQ: Lossless ultra-low-bit neural network quantization for medical image segmentation
🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
🤗 Optimum Intel: Accelerate inference with Intel optimization tools
Self-Created Tools to convert ONNX files (NCHW) to TensorFlow/TFLite/Keras format (NHWC). The purpose of this tool is to solve the massive Transpose extrapolation problem in onnx-tensorflow (onnx-tf). I don't need a Star, but give me a pull request.
SOTA Weight-only Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"
Fast inference engine for Transformer models
Train, Evaluate, Optimize, Deploy Computer Vision Models via OpenVINO™
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
A Python API that facilitates training, creating, and transferring attacks with quantized DNNs
A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
AIMET GitHub pages documentation
Brevitas: neural network quantization in PyTorch
Dataflow compiler for QNN inference on FPGAs
Add a description, image, and links to the quantization topic page so that developers can more easily learn about it.
To associate your repository with the quantization topic, visit your repo's landing page and select "manage topics."