Qwen3.5-122B-A10B – A 122‑Billion‑Parameter Sparse Mixture‑of‑Experts Vision‑Language Model
Qwen3.5-122B-A10B is a 122‑billion‑parameter causal language model with a vision encoder that activates only ~10 B parameters per inference via a 256‑expert Mixture‑of‑Experts architecture (8 routed + 1 shared expert). It features a hybrid Gated‑DeltaNet + Gated‑Attention design, a context window of 262 k tokens (extendable to ~1 M), and supports 201 languages. Benchmarks show it leads its size class on a wide range of tasks: MMLU‑Pro 86.7, IFEval 93.4, LongBench v2 60.2, HLE w/ CoT 25.3, SWE‑bench Verified 72.0, and Terminal Bench 2 49.4. The model is released under Apache‑2.0, compatible with Transformers, vLLM, SGLang, KTransformers, and Azure endpoints, making it ready for both research and production deployments.
Project Ideas
- Deploy the model as a multimodal assistant for enterprise knowledge bases, leveraging its 262k‑token context to ingest large documents and images together.
- Fine‑tune the 10 B activated subnet on domain‑specific data (e.g., medical imaging reports) to improve accuracy while keeping inference costs low.
- Use the sparse MoE structure to run inference on consumer‑grade GPUs by routing only a few experts, enabling cost‑effective inference at scale.
- Benchmark the model on emerging long‑context tasks (e.g., code‑review over full repositories) to explore the limits of its extended 1 M token window.
- Research alternative routing strategies (e.g., dynamic expert selection based on visual content) to further boost performance on vision‑language tasks.