dataset April 09, 2026

Redacted Coding Agent Session Traces from pi-mono – A New Dataset for Code Generation Research

The *badlogicgames/pi-mono* dataset offers a collection of redacted coding‑agent session traces harvested from work on the open‑source *pi-mono* repository (https://github.com/badlogic/pi-mono.git). Each trace is stored as a JSON Lines file that records every interaction in a Pi workspace, including user prompts, assistant messages, tool results, model changes, and branching information. The dataset was exported with the *pi-share-hf* tool, which applies deterministic secret redaction and an LLM‑based review to ensure that only sessions relevant to the OSS project and free of obvious sensitive data are shared.

The data is structured as a tree via `id` and `parentId`, allowing a single file to capture multiple branches of development work. Entries span a range of content: session headers, thinking‑level changes, compaction summaries, and optional embedded images (when not omitted with `--no-images`). The dataset is tagged for text‑generation, coding‑agent, and multilingual (English and code) usage, and it is provided in JSON format compatible with the Hugging Face *datasets* library as well as Dask, Polars, and ML‑Croissant for scalable processing.

Because the traces are derived from real coding sessions, they represent authentic human‑LLM collaboration patterns, making the dataset valuable for training or evaluating code‑generation models, building debugging assistants, or studying interaction flows in AI‑augmented development. The creators note that redaction is best‑effort, so downstream users should apply additional caution when handling potentially sensitive or off‑topic content.

Project Ideas

  1. Fine‑tune a code‑completion model on the session messages to improve its ability to follow multi‑turn coding instructions.
  2. Create a visualizer that reconstructs the tree‑structured workflow of a Pi session, highlighting branch decisions and tool outputs.
  3. Develop an evaluation benchmark that measures how well a coding assistant can reproduce the assistant messages given the user prompts from the traces.
  4. Build a data pipeline using Dask or Polars to aggregate statistics on tool usage patterns across all sessions.
  5. Design a privacy‑audit tool that scans the redacted traces for any remaining sensitive patterns using rule‑based detection.
← Back to all reports