dataset April 27, 2026

MathNet v0: Multilingual Olympiad Math Reasoning & Retrieval Dataset Gains Traction

MathNet v0 is a large‑scale, multimodal dataset of Olympiad‑level mathematics problems released by ShadenA (MIT) and featured in ICLR 2026. It aggregates 30,676 expert‑authored problems from 47 countries spanning 17 languages and two decades of competitions. Each entry includes a Markdown problem statement, LaTeX‑formatted solutions, hierarchical topic annotations, and, when available, inline figures stored as HF Image objects. The dataset is distributed in optimized Parquet format and can be loaded directly via the `datasets` library with a single `load_dataset` call.

The collection targets three benchmark tasks: (I) generative problem solving, where models generate solutions that are graded against the expert solutions; (II) math‑aware retrieval, measuring how well embedding models retrieve mathematically equivalent or structurally similar problems; and (III) retrieval‑augmented problem solving, assessing the impact of retrieved context on reasoning performance. The README notes that state‑of‑the‑art models still struggle on these tasks, highlighting MathNet's difficulty and its value for research.

Beyond the core solving benchmark, MathNet provides a rich taxonomy covering geometry, algebra, number theory, combinatorics, calculus, and probability, as well as metadata such as competition name, country, language, and problem type. Its multimodal nature (text + images) and multilingual coverage make it a unique resource for evaluating and training large language models, vision‑language models, and retrieval systems on high‑level mathematical reasoning.

Project Ideas

Fine‑tune a multilingual LLM on the `all` config and evaluate its performance on the MathNet problem‑solving benchmark across all 17 languages.
Build an embedding‑based retrieval system that indexes the `problem_markdown` field and tests Math‑Aware Retrieval using the curated equivalent‑problem pairs in MathNet.
Create a vision‑language pipeline that extracts diagram information from the `images` column and generates textual descriptions to aid downstream problem solving.
Develop a curriculum generator that selects problems by topic hierarchy (e.g., Geometry > Plane Geometry > Quadrilaterals) to assemble language‑specific study sets for students.
Benchmark existing multimodal models (e.g., GPT‑4‑Vision, Gemini‑Flash) on the retrieval‑augmented problem solving task to quantify the benefit of providing similar problems as context.

← Back to all reports