Documentation

mlx-optiq is built around one measurement: per-layer KL-divergence sensitivity, computed once on calibration data. That signal drives three optimization passes: mixed-precision weight quantization, mixed-precision KV-cache allocation, and sensitivity-aware LoRA rank scaling. Around them sit the rest of the toolkit: hot-swap LoRA adapters, a dual-protocol (OpenAI + Anthropic) inference server, and a sandboxed code-execution helper for agent workflows.

This site is the canonical reference. Every page is self-contained: code examples are copy-paste runnable on a stock Mac with Python 3.11+ and 16 GB+ RAM.

Pick a path

I want to use a pre-built quant

Start with Installation, then jump to your model family: Qwen3.5, Qwen3.6, or Gemma-4. Each has a 5-minute hello-world plus model-specific tips (chat template, sampling defaults, recommended context length).

I want to quantize my own model

Read How sensitivity works to understand the algorithm, then the convert CLI reference. The --reference auto flag picks bf16 when it fits and a uniform-4-bit baseline when it doesn't.

I want to fine-tune with LoRA

The LoRA fine-tuning guide covers PEFT-compatible adapter output, sensitivity-aware rank scaling, and the empirical training-ceiling map for a 36 GB Mac across all 12 supported models.

I want to serve an LLM

The KV-quant serving guide covers running optiq serve with both the OpenAI /v1/chat/completions and Anthropic /v1/messages endpoints from the same process, plus mixed-precision KV cache and a mounted LoRA adapter.

I want to evaluate a quant

The eval CLI ships a smoketest (KL + GSM8K-50, ~5 min on 27B) for triage and a full benchmark suite (MMLU + GSM8K + IFEval + BFCL + HumanEval, ~1.5 h on 27B) that produces the Capability Score on every model card. HumanEval runs in a layered sandbox (apple/container → sandbox-exec → subprocess + rlimit). Methodology and what the suite caught is in the eval-framework write-up.

For agents and IDEs

The full library reference is also published as a single Markdown file: /llms.txt. Drop it into Claude Code, Cursor, or any agent context window. It's everything mlx-optiq in ~12 KB.