GPU Info

Model	Size (GB)	CUDA Cores
GeForce RTX 3050 Mobile/Laptop	4	2048
GeForce RTX 3050	8	2304
GeForce RTX 4050 Mobile/Laptop	6	2560
GeForce RTX 3050 Ti Mobile/Laptop	4	2560
GeForce RTX 4060	8	3072
GeForce RTX 3060	12	3584
GeForce RTX 3060 Mobile/Laptop	6	3840
GeForce RTX 4060 Ti	16	4352
GeForce RTX 4070 Mobile/Laptop	8	4608
GeForce RTX 3060 Ti	8	4864
GeForce RTX 3070 Mobile/Laptop	8	5120
GeForce RTX 3070	8	5888
GeForce RTX 4070	12	5888
GeForce RTX 3070 Ti	8	6144
GeForce RTX 3070 Ti Mobile/Laptop	8-16	6144
GeForce RTX 4070 Super	12	7168
GeForce RTX 4080 Mobile/Laptop	12	7424
GeForce RTX 3080 Ti Mobile/Laptop	16	7424
GeForce RTX 4070 Ti	12	7680
GeForce RTX 4080	12	7680
GeForce RTX 3080	10	8704
GeForce RTX 4070 Ti Super	16	8448
GeForce RTX 3080 Ti	12	8960
GeForce RTX 4080	16	9728
GeForce RTX 4090 Mobile/Laptop	16	9728
GeForce RTX 4080 Super	16	10240
GeForce RTX 3090	24	10496
GeForce RTX 3090 Ti	24	10752
GeForce RTX 4090 D	24	14592
GeForce RTX 4090	24	16384

CUDA Compute Compatibility

CUDA Compute Capability	Architecture	GeForce
6.1	Pascal	Nvidia TITAN Xp, Titan X, GeForce GTX 1080 Ti, GTX 1080, GTX 1070 Ti, GTX 1070, GTX 1060, GTX 1050 Ti, GTX 1050, GT 1030, GT 1010, MX350, MX330, MX250, MX230, MX150, MX130, MX110
7.0	Volta	NVIDIA TITAN V
7.5	Turing	NVIDIA TITAN RTX, GeForce RTX 2080 Ti, RTX 2080 Super, RTX 2080, RTX 2070 Super, RTX 2070, RTX 2060 Super, RTX 2060 12GB, RTX 2060, GeForce GTX 1660 Ti, GTX 1660 Super, GTX 1660, GTX 1650 Super, GTX 1650, MX550, MX450
8.6	Ampere	GeForce RTX 3090 Ti, RTX 3090, RTX 3080 Ti, RTX 3080 12GB, RTX 3080, RTX 3070 Ti, RTX 3070, RTX 3060 Ti, RTX 3060, RTX 3050, RTX 3050 Ti(mobile), RTX 3050(mobile), RTX 2050(mobile), MX570
8.9	Ada Lovelace	GeForce RTX 4090, RTX 4080 Super, RTX 4080, RTX 4070 Ti Super, RTX 4070 Ti, RTX 4070 Super, RTX 4070, RTX 4060 Ti, RTX 4060

Ctranslate2 Quantization Compatibility

NOTE: Only Ampere and later Nvidia GPUs support the new flash_attention parameter in Ctranslate2.

CPU

Architecture	int8_float32	int8_float16	int8_bfloat16	int16	float16	bfloat16
x86-64 (Intel)	int8_float32	int8_float32	int8_float32	int16	float32	float32
x86-64 (other)	int8_float32	int8_float32	int8_float32	int8_float32	float32	float32
AArch64/ARM64 (Apple)	int8_float32	int8_float32	int8_float32	int8_float32	float32	float32
AArch64/ARM64 (other)	int8_float32	int8_float32	int8_float32	int8_float32	float32	float32

Nvidia GPU

Compute Capability	int8_float32	int8_float16	int8_bfloat16	int16	float16	bfloat16
>= 8.0	int8_float32	int8_float16	int8_bfloat16	float16	float16	bfloat16
>= 7.0, < 8.0	int8_float32	int8_float16	int8_float32	float16	float16	float32
6.2	float32	float32	float32	float32	float32	float32
6.1	int8_float32	int8_float32	int8_float32	float32	float32	float32
<= 6.0	float32	float32	float32	float32	float32	float32

Chat Model Benchmarks

Tested using ctranslate2 running in int8 on an RTX 4090.

Model	Tokens per Second	VRAM Usage (GB)
gemma-1.1-2b-it	63.69	3.0
Phi-3-mini-4k-instruct	36.46	4.5
dolphin-llama2-7b	37.43	7.5
Orca-2-7b	30.47	7.5
Llama-2-7b-chat-hf	37.78	7.6
neural-chat-7b-v3-3	28.38	8.1
Meta-Llama-3-8B-Instruct	30.12	8.8
dolphin-2.9-llama3-8b	34.16	8.8
Mistral-7B-Instruct-v0.3	32.24	7.9
SOLAR-10.7B-Instruct-v1.0	23.32	11.7
Llama-2-13b-chat-hf	25.12	14.0
Orca-2-13b	20.01	14.1

Concurrency

Library/Tool	Type	Best Use Case	Pros	Cons
Python `threading`	Threading	I/O-bound tasks	Simple API, good for I/O-bound tasks	GIL limits effectiveness for CPU-bound tasks
Python `multiprocessing`	Multiprocessing	CPU-bound tasks	True parallelism, bypasses GIL	Higher memory overhead, complex IPC
Python `subprocess`	Process control	Running external commands	Simple process control, capture I/O	Limited to external process management
concurrent.futures	High-level API for Threading & Multiprocessing	Unified task management	Simplifies task execution, combines threading and multiprocessing	Limited flexibility, higher abstraction
asyncio	Async/Coroutine	I/O-bound, high concurrency tasks	Non-blocking I/O, single-threaded concurrency	Steeper learning curve due to coroutines and event loop
QThread	Threading	Integrating threads into the Qt event loop, signal-slot communication	Seamless Qt integration, easy inter-thread communication	More boilerplate, requires subclassing
QRunnable/QThreadPool	Threading	Managing multiple short-lived tasks within Qt applications	Efficient task management, less boilerplate	Requires understanding of Qt threading architecture
QtConcurrent	Threading	High-level parallel tasks in Qt	High-level functions for parallel execution, automatic thread pooling	Less control over individual threads
QProcess	Process control	Running external commands in Qt applications	Integrates with Qt, handles process I/O	Limited to process control

Summary

Python threading: Best for simple I/O-bound tasks.

Python multiprocessing: Best for CPU-bound tasks requiring true parallelism.

Python subprocess: Simple external process management. Use when you need straightforward process control and portability across different environments.

concurrent.futures: Unified API for high-level task management.

asyncio: Suitable for I/O-bound tasks with high concurrency, single-threaded.

QThread: Ideal for complex threading in Qt applications with signal-slot communication.

QRunnable/QThreadPool: Efficient for managing multiple short-lived tasks in Qt.

QtConcurrent: Simplifies parallel task execution in Qt applications.

QProcess: Handles running and managing external processes within Qt applications. Use when you need tight integration with the Qt event loop and signal-slot mechanism.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

misc.md

misc.md

GPU Info

CUDA Compute Compatibility

Ctranslate2 Quantization Compatibility

CPU

Nvidia GPU

Chat Model Benchmarks

Concurrency

Summary

Files

misc.md

Latest commit

History

misc.md

File metadata and controls

GPU Info

CUDA Compute Compatibility

Ctranslate2 Quantization Compatibility

CPU

Nvidia GPU

Chat Model Benchmarks

Concurrency

Summary