model March 04, 2026

Qwen3.5-122B-A10B – A 122‑Billion‑Parameter Sparse Mixture‑of‑Experts Vision‑Language Model

Qwen3.5-122B-A10B is a 122‑billion‑parameter causal language model with a vision encoder that activates only ~10 B parameters per inference via a 256‑expert Mixture‑of‑Experts architecture (8 routed + 1 shared expert). It features a hybrid Gated‑DeltaNet + Gated‑Attention design, a context window of 262 k tokens (extendable to ~1 M), and supports 201 languages. Benchmarks show it leads its size class on a wide range of tasks: MMLU‑Pro 86.7, IFEval 93.4, LongBench v2 60.2, HLE w/ CoT 25.3, SWE‑bench Verified 72.0, and Terminal Bench 2 49.4. The model is released under Apache‑2.0, compatible with Transformers, vLLM, SGLang, KTransformers, and Azure endpoints, making it ready for both research and production deployments.

Project Ideas

Deploy the model as a multimodal assistant for enterprise knowledge bases, leveraging its 262k‑token context to ingest large documents and images together.
Fine‑tune the 10 B activated subnet on domain‑specific data (e.g., medical imaging reports) to improve accuracy while keeping inference costs low.
Use the sparse MoE structure to run inference on consumer‑grade GPUs by routing only a few experts, enabling cost‑effective inference at scale.
Benchmark the model on emerging long‑context tasks (e.g., code‑review over full repositories) to explore the limits of its extended 1 M token window.
Research alternative routing strategies (e.g., dynamic expert selection based on visual content) to further boost performance on vision‑language tasks.

← Back to all reports