dataset February 10, 2026

Moltbook Annotated Posts & Submolts: A Rich Resource for Content Classification

The Moltbook Dataset, released by TrustAIRLab, provides over 44,000 GPT‑5.2‑annotated posts and 12,209 submolts harvested from the agent social network Moltbook. Each post is labeled with one of nine content categories (e.g., Identity, Technology, Politics) and one of five toxicity levels ranging from Safe to Malicious. The dataset is organized into two configurations—`posts` and `submolts`—and stored in Parquet format, making it compatible with the HuggingFace `datasets` library as well as pandas and Polars for easy data manipulation.

The `posts` configuration includes rich metadata such as comment counts, upvotes, downvotes, titles, URLs, and nested submolt information, while the `submolts` configuration captures identifiers, display names, descriptions, subscriber counts, and activity timestamps. With a total size of ~45 MB and 44,376 training examples for posts, the dataset is well‑suited for supervised learning tasks like multi‑class content categorization and fine‑grained toxicity detection. Researchers can also leverage the submolt information for network‑oriented analyses of agent interactions and community dynamics.

Because the annotations were generated by a GPT‑5.2 model following a detailed codebook, the dataset offers a consistent labeling schema that can serve as a benchmark for evaluating text classification, moderation, and sociolinguistic models in the emerging field of agent‑centric social platforms. The accompanying paper and project page provide further context and methodological details for anyone interested in studying the Moltbook ecosystem.

Project Ideas

  1. Train a multi‑label classifier to predict the nine content categories of Moltbook posts.
  2. Develop a toxicity detection model using the five defined toxicity levels for content moderation pipelines.
  3. Perform a temporal analysis of submolt activity and topic distribution to uncover trends in the agent social network.
  4. Create a recommendation system that matches users to relevant submolts based on post content and category annotations.
  5. Build a benchmark suite comparing transformer models on the Moltbook classification and toxicity tasks.
← Back to all reports