model May 24, 2026

Command A+ 05‑2026: Multilingual Vision‑Enabled Chatbot with Tool Use

Command A+ is an open‑source, 25 billion‑parameter decoder‑only Sparse Mixture‑of‑Experts model released by Cohere and Cohere Labs. It supports both text and image inputs (pipeline tag **image-text-to-text**) and can generate up to 4096 new tokens per request, with a massive 128 K token context window. The model is multilingual, covering 48 languages ranging from English and Chinese to Arabic and Hindi, and it is licensed under Apache 2.0.

Designed for agentic, reasoning‑heavy enterprise workloads, Command A+ includes built‑in conversational tool‑use capabilities, allowing it to call external functions or APIs via the Transformers chat template. It ships in several quantizations (BF16, FP8, W4A4) and can be run with the standard Transformers library or via vLLM for high‑throughput deployments. The recommended W4A4 quantization offers the best speed‑latency trade‑off while keeping quality comparable to higher‑precision versions.

The model’s architecture blends sliding‑window and global attention with rotational positional embeddings, and its MoE routing activates eight of 128 experts per token. With a context length of 128 K and output length of 64 K, it is well‑suited for long‑form reasoning, document analysis, and multi‑modal chat applications. Its popularity today stems from the combination of multilingual vision support, enterprise‑grade tool calling, and the flexible deployment options provided by Cohere Labs.

Project Ideas

  1. Build a multilingual visual‑question‑answering chatbot that can answer user queries about uploaded images in any of the 48 supported languages.
  2. Create an enterprise sales‑assistant that uses Command A+ to retrieve daily sales summaries via tool calls and present the results in natural language across multiple languages.
  3. Develop an AI‑powered help‑desk agent that can read screenshots of error messages, reason about the issue, and invoke diagnostic tools through function calling.
  4. Implement a long‑form document summarizer that ingests PDFs containing embedded figures, generates a concise summary, and cites the source figures using the model’s citation tags.
  5. Deploy a real‑time multimodal tutoring system that accepts handwritten math problems as images, explains the solution step‑by‑step, and can call external math‑solver APIs when needed.
← Back to all reports