dataset April 12, 2026

Vietnam Real Estate Listings 2025: 1M Records for Price Prediction & Market Insight

The **Tinix Vietnam Real Estate Listings 2025** dataset, curated by TiniX AI, provides a comprehensive snapshot of the Vietnamese property market with exactly **1,000,000** listings collected between June and December 2025. Each record contains both **tabular attributes** (price, area, floor count, bedroom/bathroom numbers, geographic identifiers, etc.) and **textual fields** (title and description) in Vietnamese. The data is stored in Parquet format and can be loaded via the Hugging Face `datasets` library, with optional support for Dask, Polars, and mlcroissant for scalable processing. It is released under a CC BY‑NC 4.0 license, making it free for academic and educational use.

The dataset is tagged for **tabular‑regression**, **geospatial**, and **price‑prediction** tasks, reflecting its primary use cases: building models that estimate property prices from physical characteristics and location, conducting spatial market analyses, and tracking temporal trends in listings. The schema includes geographic fields down to the ward level and a `published_at` timestamp, enabling fine‑grained GIS visualisations and time‑series studies. The single `train` split contains all 1M examples, providing a sizable training set for machine‑learning pipelines.

What makes this dataset notable is its breadth (covering all 63 Vietnamese provinces), its blend of structured and unstructured data, and its focus on a market that historically lacks large‑scale open data. Researchers can leverage it to benchmark price‑prediction models, explore regional price disparities, or fine‑tune Vietnamese language models on real‑estate text. The README even offers quick‑start code snippets for filtering listings by province, computing average prices, and extracting apartment subsets, illustrating the dataset’s immediate applicability.

Project Ideas

  1. Train a tabular regression model to predict property prices from features like area, bedroom count, and location coordinates.
  2. Create an interactive GIS dashboard that visualises average price and listing density per province and district using the geographic fields.
  3. Build a temporal analytics pipeline to chart monthly price trends and listing volumes across 2025, identifying seasonal market patterns.
  4. Fine‑tune a Vietnamese language model on the `description` and `name` fields to classify property types or extract key amenities.
  5. Develop a market segmentation tool that clusters listings by price, size, and property type to highlight premium versus affordable segments in each city.
← Back to all reports