mlx-optiq
Engineering · research · releases

Blog

Release notes, methodology dives, and benchmarking deep-dives. New posts land alongside major releases and research findings.

2026·04·28
The mlx-optiq eval framework: five benchmarks, one Capability Score
GSM8K-50 alone misses tool-calling regressions. The two-stage eval (KL + GSM8K-50 for triage, MMLU + GSM8K + IFEval + BFCL + HumanEval for headlines) drives every quant we ship. Plus the auto-resolved KL reference, sandboxed code execution, and a single Capability Score that's the unweighted mean of the five benchmarks.
methodology
2026·04·28
optiq.jsonl: a six-domain calibration mix for mixed-precision quantization
WikiText-2 measures prose; modern LLMs do prose, reasoning, code, agent loops, tool-calling, and constraint-following instructions. We replaced the calibration set with 40 hand-curated samples across all six domains, bundled inside the package, fully reproducible. What you calibrate on is what you protect.
engineering
2026·04·25
Gemma-4 lands on mlx-optiq: four sizes, +32 pp on the small one
Adding Google's full Gemma-4 instruct lineup: e2b, e4b, 26B-A4B sparse MoE, and 31B dense. The +32-point GSM8K recovery on gemma-4-e4b is the cleanest mixed-precision win we have. Plus the shared-KV caveat that means you'll want Qwen for quantized-KV serving.
engineering
2026·04·17
TurboQuant: postmortem on a research path we didn't ship
We built rotated-space KV attention with a custom Metal kernel. The benchmarks looked good: 100 % needle retrieval at 4-bit vs 73 % for affine. We still chose affine for the shipping path. Plain writeup of the technique, the numbers, and why the marginal win didn't justify a parallel serving stack.
postmortem
2026·04·08
Sensitivity-aware LoRA: fine-tuning that respects the bit budget
The same per-layer signal that drives mixed-precision quantization also drives adapter rank. 8-bit-quantized layers get 2× the adapter rank of 4-bit-quantized ones at the same parameter budget. Validation loss drops 12 % in head-to-head A/Bs. Plus the empirical training-ceiling map for a 36 GB Mac across all 10 supported models.
engineering
2026·03·20
Not All Layers Are Equal: mixed-precision quantization for weights and KV cache on Apple Silicon
The research foundation behind mlx-optiq. Some layers are 56× more sensitive than others. The KV cache becomes the dominant memory cost at long contexts. Mixed-precision recovers what uniform 4-bit drops; mixed-precision KV fixes the perplexity collapse uniform 4-bit causes.
research