dataset June 16, 2026

KSAFE-MM: Korean Multimodal Safety Benchmark Gains Traction

KSAFE-MM, released by K‑intelligence, is a Korean multimodal safety benchmark designed for academic research on AI safety. The dataset comprises 14,135 query‑image pairs split into two subsets: *KSAFE‑MM‑G*, which contains real‑world images, typo‑only, and combined image‑typo entries, and *KSAFE‑MM‑C*, a synthetic‑image collection tailored to Korean cultural contexts. Each entry is annotated with one of 11 safety risk categories spanning content safety, socio‑economic, and legal/rights‑related risks.

Stored in optimized Parquet files and accessible via the Hugging Face `datasets` library (also compatible with pandas and polars), the benchmark provides both a `test.parquet` split and accompanying JSONL metadata describing IDs, categories, query text, image paths, and for the synthetic subset, the jailbreak strategy applied (e.g., `CharacterRolePlay`, `SudoMode`). The dataset is accompanied by a peer‑reviewed arXiv preprint (arXiv:2605.28013) and a technical blog post, underscoring its relevance for evaluating Multimodal Large Language Models (MLLMs) on Korean safety challenges.

Because the content includes potentially harmful visual and textual material, access is gated behind an ethical agreement that restricts use to safety evaluation and prohibits training models for harmful generation. Licensed under CC BY‑NC 4.0, KSAFE‑MM is intended for benchmarking, robustness assessment, and the development of content‑moderation systems that respect Korean cultural sensitivities.

Project Ideas

  1. Benchmark a Korean multimodal LLM on KSAFE‑MM to measure its performance across the 11 safety risk categories.
  2. Train a multimodal classifier that predicts the safety category from image‑text pairs using the dataset's labeled entries.
  3. Compare the effectiveness of different jailbreak strategies (template_type) by evaluating model responses on the synthetic KSAFE‑MM‑C subset.
  4. Develop a Korean‑focused content‑moderation pipeline that flags high‑risk queries using the KSAFE‑MM annotations as ground truth.
  5. Analyze the distribution of risk categories between real and synthetic subsets to study cultural bias in safety assessments.
← Back to all reports