Fine-T2I: 6M High‑Quality Text‑Image Pairs for Open T2I Fine‑Tuning
Fine‑T2I is a large‑scale, open dataset released by Xu Ma, Yitian Zhang, Qihua Dong, and Yun Fu from Northeastern University. It contains over 6.15 million text–image pairs (about 2 TB) organized in WebDataset format, with each sample providing a JPEG image, a raw prompt (txt) and a metadata JSON file. The collection mixes roughly 6 million synthetic samples generated by state‑of‑the‑art diffusion models (Z‑Image, FLUX2) and 168 k curated real photographs sourced from professional platforms such as Pexels, Pixabay, and Unsplash.
The dataset is built for both text‑to‑image (T2I) fine‑tuning and image‑to‑text tasks, offering dual prompt annotations (original and enhanced) to support varied user behaviors. It spans 10 task combinations, 32 prompt categories, and 11 visual styles, with high‑resolution images (>1K) and randomized aspect ratios. A rigorous filtering pipeline—semantic deduplication, safety checks, aesthetic scoring, and a VLM‑based visual quality auditor—removed more than 95 % of candidates, ensuring tight text‑image alignment and minimal artifacts.
Fine‑T2I is trending on Hugging Face, reaching the #2 spot overall and #1 among image datasets. It can be streamed directly via the `datasets` library using the `webdataset` loader, avoiding the need to download the full 2 TB. The authors provide a sample notebook and a dedicated Space for dataset exploration, and they plan a larger “fine‑t2i‑v2” version.
Researchers and developers are encouraged to cite the accompanying arXiv paper (arXiv:2602.09439) and to contact the authors for collaborations or issues.
Project Ideas
- Fine‑tune a diffusion model on the synthetic and curated splits of Fine‑T2I to improve prompt adherence and visual fidelity.
- Train a dual‑encoder vision‑language model for image captioning using the paired jpg‑txt files as supervision.
- Create a benchmark suite that evaluates safety and aesthetic filters by running existing models on the filtered Fine‑T2I samples.
- Develop a prompt‑enhancement tool that learns to convert original prompts into the enhanced versions provided in the dataset.
- Build a streaming data pipeline that dynamically loads Fine‑T2I shards for large‑scale experiments without local storage.