dataset May 10, 2026

Claude Opus 4.6/4.7 Reasoning Dataset: 8.7K Synthetic CoT Examples Across 28 Domains

angrygiraffe/claude-opus-4.6-4.7-reasoning-8.7k ↗

The *Claude Opus 4.6/4.7 Reasoning Dataset* is a synthetic instruction‑tuning collection created entirely by Claude Opus models (versions 4.6 and 4.7). It contains 8,706 OpenAI‑style chat examples in JSONL format, totalling roughly 17 M tokens. Every assistant turn includes a dedicated `<think>` block that provides genuine chain‑of‑thought (CoT) reasoning of 150–500 words, making the dataset unique for teaching language models *how to think* rather than merely what to say.

The data span 28 fully populated categories, ranging from technical domains such as coding, math, physics, and medicine to humanities, arts, finance, law, and creative role‑play. Four predefined splits—Full, Instruct, Roleplay, and Code—allow users to target specific subsets (e.g., the 1,840 coding‑and‑math examples for code‑focused fine‑tuning). About 40 % of the conversations are multi‑turn, providing context‑building and revision dynamics. System prompts are highly varied (5,814 unique prompts), giving models exposure to diverse personas and problem settings.

Because the dataset is synthetic and contains no refusals or safety hedging, it is intended for capability‑building rather than alignment. The provenance field records whether each example was generated by Claude Opus 4.6 (53.7 %) or 4.7 (46.3 %), with the newer version contributing longer, richer dialogues. Licensed under Apache‑2.0, the dataset can be freely used for fine‑tuning, evaluation of reasoning, or research on multi‑turn conversational behavior.

Project Ideas

Fine‑tune a small language model on the *Instruct* split to improve its chain‑of‑thought reasoning for coding and math queries.
Create a role‑play chatbot that adopts distinct villain or hero personas using the *Roleplay* split as dialogue templates.
Benchmark existing models' ability to generate coherent reasoning by comparing their outputs against the dataset's `<think>` blocks.
Build a multi‑turn educational assistant that can handle follow‑up questions in science or humanities, leveraging the 39.7 % multi‑turn examples.
Develop a domain‑specific QA system for finance or medicine by filtering the *Full* dataset to the relevant category and fine‑tuning on those examples.

← Back to all reports