model March 24, 2026

OmniCoder-9B: A 9B Coding Agent Fine‑Tuned on 425K Agentic Trajectories

OmniCoder-9B is a 9‑billion‑parameter coding agent released by Tesslate and built on top of Qwen3.5‑9B’s hybrid Gated‑Delta/standard‑attention architecture. It has been fine‑tuned with LoRA (r=64, alpha=32) on more than 425,000 curated agentic coding trajectories sourced from frontier models such as Claude Opus 4.6, GPT‑5.4, GPT‑5.3‑Codex, and Gemini 3.1 Pro. The training data emphasizes real‑world software‑engineering workflows, tool‑use, terminal operations, and multi‑step reasoning, enabling the model to recover from errors, respond to LSP diagnostics, and emit minimal edit diffs rather than full rewrites.

The model supports a 262k token context window (extendable beyond 1 M tokens) and introduces a special `<think>...</think>` mode for explicit reasoning chains. Benchmarks show strong performance on coding and reasoning tasks: 90 % pass@5 on the AIME 2025 dataset, 83.8 % pass@1 and 86.4 % pass@3 on GPQA Diamond, and a 23.6 % pass rate on Terminal‑Bench 2.0, substantially improving over the base Qwen3.5‑9B. All weights are released under the Apache 2.0 license and are available in both full‑precision and GGUF quantized formats.

OmniCoder‑9B can be used directly via the Transformers library, vLLM, or llama.cpp (GGUF). Example code snippets demonstrate loading the model, prompting it with chat templates, and generating code or explanations. The repository also provides recommended sampling parameters (temperature 0.6, top‑p 0.95, top‑k 20) and guidance for deterministic tool‑calling scenarios. While English performance is well‑documented, non‑English capabilities have not been extensively evaluated.

Project Ideas

Create an AI pair‑programmer extension for VS Code that leverages OmniCoder‑9B’s read‑before‑write and LSP‑aware error recovery to suggest precise code edits.
Build an automated terminal assistant that executes commands, parses output, and fixes failures by generating minimal edit diffs using the model’s tool‑calling patterns.
Develop a multi‑step coding tutorial generator that employs the `<think>` tags to break down algorithm explanations into reasoning steps before presenting the final code.
Deploy a low‑latency code‑completion API with vLLM, enabling CI pipelines to request context‑aware code snippets or bug fixes on demand.
Design a web‑based debugging chatbot that ingests stack traces or error messages and proposes concise patches, taking advantage of the model’s error‑recovery training.

← Back to all reports