model May 27, 2026

NuExtract3: 4B Vision‑Language Model for Structured Document Extraction & Markdown OCR

NuExtract3 is a 4 billion‑parameter vision‑language model built on Qwen/Qwen3.5‑4B and released by Numind. Tagged for image‑to‑text, document‑understanding, OCR, and structured information extraction, it supports multimodal inputs (text, images, or both) and can operate in multilingual and conversational contexts. The model is distributed as a Transformers checkpoint in safetensors format and can be served via vLLM with an OpenAI‑compatible API.

The model shines in two complementary tasks. First, it performs **structured extraction** by taking an input document, a JSON template that mirrors the desired output schema, and optional instructions, then returns JSON data that respects the template's type specifications (e.g., verbatim‑string, date, number). Second, it converts document images directly into clean **Markdown** (with HTML for tables and LaTeX for math), making it useful for OCR preprocessing and RAG pipelines. Both reasoning and non‑reasoning inference modes are available, allowing users to balance speed and accuracy for difficult layouts.

NuExtract3 has been benchmarked on an internal structured‑extraction suite of ~600 diverse documents, achieving an average score of 0.651 ± 0.019—substantially higher than several strong baselines such as Gemma‑4‑E4B‑it and the base Qwen3.5‑4B model. The README also provides detailed deployment examples, template generation from natural‑language descriptions, and code snippets for handling multi‑page PDFs, in‑context examples, and vLLM serving. With over 20 k downloads and a trending score of 130, the model is rapidly gaining attention for enterprise document‑automation use cases.

Project Ideas

Create an automated invoice processing pipeline that extracts fields like invoice number, date, total amount, and line items into a structured JSON for accounting software.
Build a multilingual contract digitization tool that converts scanned agreements into Markdown with preserved headings, tables, and embedded figures for searchable knowledge bases.
Develop a personal finance app that uses NuExtract3 to read receipt images, extract purchase details, and categorize expenses in real time.
Deploy a chatbot that answers user queries by performing on‑the‑fly OCR of uploaded forms or PDFs and returning structured answers based on a supplied JSON template.
Implement a bulk document‑to‑Markdown conversion service that turns archives of scanned reports into clean Markdown files, enabling downstream indexing and RAG retrieval.

← Back to all reports