CCN Dataset: Tabular Classification for Advanced Route Recommendation
The GD-ML/CCN dataset, released by the GD-ML team, supports the research paper *Towards Full Candidate Interaction: A Comprehensive Comparison Network for Better Route Recommendation*. It is a tabular‑classification dataset (task_categories:tabular-classification) stored in CSV format and sized between 100 K and 1 M rows. The dataset is built for route recommendation tasks and is accessible through the Hugging Face `datasets` library as well as Dask, Polars, and ML Croissant, making it easy to load and manipulate at scale.
The data is organized into three feature groups. **Route Features** (N × 62) describe each individual route with static, dynamic, and trajectory statistics such as estimated time of arrival (ETA) and total distance. **Scene Features** (1 × 10) capture contextual information for a recommendation request, including request time and the user's familiarity with the origin and destination. Finally, **Comparison‑Level Features** (N × N × 27) encode pairwise differences between routes, such as ETA and distance gaps, enabling models to learn comparative reasoning across candidate routes.
Because the dataset provides both per‑route attributes and explicit pairwise comparison metrics, it is particularly suited for training models that perform candidate interaction, ranking, or side‑by‑side evaluation of routes. Its inclusion of contextual scene data also allows exploration of how temporal and user‑specific factors influence recommended paths. The combination of rich feature dimensions and a moderate size makes it attractive for both academic benchmarking and prototype development of intelligent navigation systems.
Project Ideas
- Train a tabular classification model to predict the most suitable route given a request's scene features and route attributes.
- Develop a pairwise ranking system that uses the comparison-level features to order candidate routes by estimated travel efficiency.
- Create a visualization dashboard that maps route feature importance and highlights how ETA and distance differences affect recommendations.
- Build a context‑aware recommendation engine that adjusts suggested routes based on user familiarity and request time using the scene features.
- Benchmark different gradient‑boosting libraries (e.g., XGBoost, LightGBM) on the CCN dataset to evaluate performance on route selection tasks.