AtomBlock-WebUI: A Synthetic UI Detection Dataset Tailored for YOLO
AtomBlock-WebUI is a synthetic dataset of roughly 9,700 full‑page web screenshots, each annotated with YOLO‑format bounding boxes for 14 UI element categories such as buttons, links, inputs, and structural blocks like navigation bars and sidebars. The data was generated by prompting the Qwen3.6‑plus LLM to produce semantic HTML layouts, injecting real images from the CC3M corpus via FAISS‑based caption retrieval, and rendering the pages with Playwright to extract pixel‑perfect element coordinates. The resulting annotations are directly aligned with the visual output, avoiding the inconsistencies of DOM‑based labeling methods.
The dataset is organized into standard YOLO folders (train/val/test) and includes the original HTML files, injected images, raw screenshots, and visualizations with overlayed boxes. With 1,321,234 total bounding boxes and a clear class distribution—most prominently links (47.4%) and icons (14%)—it offers a rich training source for object‑detection models targeting web UI components. The README provides a ready‑to‑use training script for Ultralytics YOLO, emphasizing that mosaic augmentation is unsuitable for UI detection due to the small, densely packed nature of UI elements.
AtomBlock-WebUI stands out because it bridges the gap between synthetic layout generation and real‑world visual diversity by embedding authentic images, and it supplies precise geometric labels extracted from rendered pages rather than heuristic DOM parsing. Licensed under CC BY‑NC‑SA 4.0 with additional restrictions from Mind2Web and CC3M, the dataset is intended for non‑commercial research, making it a valuable resource for academia and open‑source projects focused on UI automation, accessibility, and layout analysis.
Project Ideas
- Fine‑tune a YOLO model on AtomBlock-WebUI to create a real‑time UI element detector for automated testing tools.
- Develop a browser extension that highlights detected UI components on live websites using a model trained on this dataset.
- Generate synthetic UI screenshots for data‑augmentation pipelines in multimodal models that need visual UI understanding.
- Build a UI design assistant that suggests missing components (e.g., buttons or navigation blocks) based on detected layout gaps.
- Create a benchmark suite that evaluates UI element detection performance across synthetic and real web pages using the provided annotations.