#

quantization

Here are 565 public repositories matching this topic...

kooten111 / EasyEXL

A Python script designed to streamline the process of quantizing models to exllamav2 format

quantization llm exllama

Updated May 17, 2024
Python

SYSTRAN / faster-whisper

Faster Whisper transcription with CTranslate2

deep-learning inference transformer speech-recognition openai speech-to-text quantization whisper

Updated May 17, 2024
Python

thesven / GGUF-n-Go

your go-to tool for easily creating quantized versions of Hugging Face models in the GGUF format.

quantization llms gguf

Updated May 17, 2024
Python

quic / aimet

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.

open-source machine-learning opensource deep-neural-networks compression deep-learning pruning quantization auto-ml network-quantization network-compression

Updated May 17, 2024
Python

LLaMA-Factory

hiyouga / LLaMA-Factory

Unify Efficient Fine-Tuning of 100+ LLMs

Updated May 17, 2024
Python

alexeybelkov / MedQ

Implementation of MedQ: Lossless ultra-low-bit neural network quantization for medical image segmentation

computer-vision medical-imaging quantization medical-image-segmentation efficient-neural-networks quantization-aware-training

Updated May 17, 2024

huggingface / optimum

🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools

training optimization intel transformers inference pytorch quantization onnx tflite onnxruntime graphcore habana

Updated May 17, 2024
Python

huggingface / optimum-intel

🤗 Optimum Intel: Accelerate inference with Intel optimization tools

optimization intel transformers inference pruning quantization distillation onnx openvino diffusers

Updated May 17, 2024
Jupyter Notebook

onnx2tf

PINTO0309 / onnx2tf

Self-Created Tools to convert ONNX files (NCHW) to TensorFlow/TFLite/Keras format (NHWC). The purpose of this tool is to solve the massive Transpose extrapolation problem in onnx-tensorflow (onnx-tf). I don't need a Star, but give me a pull request.

android docker machine-learning deep-learning tensorflow models keras transformer lstm quantization coreml onnx model-converter tensorflow-lite tflite tfjs yolov7 onnx-tensorflow

Updated May 17, 2024
Python

intel / auto-round

SOTA Weight-only Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"

rounding quantization awq int4 gptq neural-compressor weight-only

Updated May 17, 2024
Python

OpenNMT / CTranslate2

Fast inference engine for Transformer models

Updated May 17, 2024
C++

openvinotoolkit / training_extensions

Train, Evaluate, Optimize, Deploy Computer Vision Models via OpenVINO™

machine-learning computer-vision deep-learning pytorch semi-supervised-learning image-classification object-detection transfer-learning image-segmentation quantization action-recognition automl incremental-learning anomaly-detection hyper-parameter-optimization self-supervised-learning openvino neural-networks-compression datumaro

Updated May 17, 2024
Python

intel / neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

sparsity pruning quantization knowledge-distillation auto-tuning int8 low-precision quantization-aware-training post-training-quantization awq int4 large-language-models gptq smoothquant sparsegpt fp4 mxformat

Updated May 17, 2024
Python

Abhishek2271 / TransferabilityAnalysis

A Python API that facilitates training, creating, and transferring attacks with quantized DNNs

deep-learning image-classification quantization adversarial-attacks

Updated May 17, 2024
Python

intel / intel-extension-for-pytorch

A Python package for extending the official PyTorch that can easily obtain performance on Intel platform

machine-learning deep-learning neural-network intel pytorch quantization

Updated May 17, 2024
Python

autohdw / QuBLAS

Quantized BLAS

template cpp blas quantization meta-programming cpp23

Updated May 17, 2024
C++

quic / aimet-pages

AIMET GitHub pages documentation

open-source machine-learning opensource deep-neural-networks compression deep-learning pruning quantization auto-ml network-quantization network-compression

Updated May 16, 2024
HTML

Xilinx / brevitas

Brevitas: neural network quantization in PyTorch

fpga deep-learning pytorch neural-networks xilinx quantization hardware-acceleration qat brevitas ptq

Updated May 17, 2024
Python

Xilinx / finn

Dataflow compiler for QNN inference on FPGAs

fpga neural-network compiler dataflow quantization

Updated May 17, 2024
Python

huggingface / quanto

A pytorch Quantization Toolkit

pytorch quantization

Updated May 17, 2024
Python

Improve this page

Add a description, image, and links to the quantization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the quantization topic, visit your repo's landing page and select "manage topics."