Qwen3.5 — quality measured on GSM8K (200 samples)
Model Original OptiQ Compression OptiQ vs Uniform 4-bit
Qwen3.5-0.8B-OptiQ-4bit 1,666 MB 570 MB 2.9× 27.0% vs 11.5% (+15.5pp)
Qwen3.5-2B-OptiQ-4bit 4,338 MB 1,365 MB 3.2× 48.0% vs 48.5%
Qwen3.5-4B-OptiQ-4bit 8,888 MB 2,811 MB 3.2× 81.5% vs 79.5% (+2.0pp)
Qwen3.5-9B-OptiQ-4bit 18,412 MB 5,763 MB 3.2× 90.0% vs 90.0%
Gemma-4 — quality measured on GSM8K (200 samples)
Model Original OptiQ Compression OptiQ vs Uniform 4-bit
gemma-4-e2b-it-OptiQ-4bit 9,772 MB 3,978 MB 2.5× 13.0% vs 5.5% (+7.5pp)
gemma-4-e4b-it-OptiQ-4bit 15,252 MB 6,028 MB 2.5× 55.5% vs 23.5% (+32.0pp)
Note on Gemma-4 serving: Gemma-4 inference works fine with fp16 KV (stock mlx_lm.server or optiq serve without --kv-config). The mixed-precision KV path currently fails on Gemma-4's shared-KV attention layers — an upstream mlx-lm limitation we're tracking for a future release.
YOLO26 object detection — quality measured on COCO128
Model Original OptiQ Compression Detection delta
YOLO26n-OptiQ-6bit 9.9 MB 2.5 MB 3.9× -1.6%
YOLO26s-OptiQ-6bit 38.4 MB 8.9 MB 4.3× -7.0%
YOLO26m-OptiQ-6bit 83.8 MB 18.9 MB 4.4× +0.1%
YOLO26l-OptiQ-6bit 100.7 MB 22.9 MB 4.4× 0.0%
YOLO26x-OptiQ-6bit 225.5 MB 50.6 MB 4.5× -1.1%
YOLO26 usage — requires mlx-optiq[yolo]Python
from optiq.models.yolo import load_quantized_yolo

model = load_quantized_yolo("mlx-community/YOLO26n-OptiQ-6bit")
results = model.predict("image.jpg")