dataset June 25, 2026

ITBench-AA: SRE Incident Scenarios for Kubernetes Root‑Cause Analysis

The **ITBench-AA** dataset, released by ArtificialAnalysis, is a curated subset of IBM's ITBench benchmark focused on Site Reliability Engineering (SRE) scenarios. It contains 40 public Kubernetes incident cases, each stored as a JSON line with fields such as `id_aa`, `scenario_id`, `ground_truth_yaml`, and metadata about the incident source and category. The data is under a CC‑BY‑4.0 license, is English‑only, and is small enough to be loaded with pandas, polars, or the Hugging Face `datasets` library.

Each entry describes an offline snapshot of a failing Kubernetes cluster—including alerts, events, traces, and topology—and provides the expected contributing‑factor entity (e.g., Deployment, Pod, ConfigMap) responsible for the failure. The dataset is designed for the **ITBench-AA leaderboard**, where agents are evaluated on their ability to ingest the snapshot and correctly pinpoint the faulty entity, making it a concrete benchmark for question‑answering style root‑cause analysis in IT operations.

ITBench-AA is trending because it offers a realistic, open‑source testbed for building and comparing SRE‑oriented AI agents, bridging the gap between academic QA tasks and practical incident‑response workflows. Its tight integration with Kubernetes, SRE, and root‑cause analysis tags makes it a valuable resource for researchers and engineers aiming to automate IT operations and improve reliability.

Project Ideas

  1. Fine‑tune a large language model on the ITBench-AA scenarios to generate root‑cause explanations for Kubernetes incidents.
  2. Develop a retrieval‑augmented QA system that queries the `ground_truth_yaml` to answer "Which entity caused the failure?" for each scenario.
  3. Create a rule‑based agent that parses alerts and events from the snapshot and maps them to the responsible Deployment or Pod.
  4. Build an interactive dashboard that visualizes the incident topology, alerts, and ground‑truth remediation steps for each scenario.
  5. Benchmark multiple open‑source LLMs on the ITBench-AA leaderboard to compare their SRE incident‑triage performance.
← Back to all reports