dataset May 15, 2026

DeepSeek V4 Pro Hermes Reasoning Traces: LoRA Fine‑Tuning Dataset

r0b0tlab/deepseek-hermes-reasoning-traces ↗

The *deepseek-hermes-reasoning-traces* dataset, published by r0b0tlab, contains 19,331 multi‑turn ChatML conversations enriched with Hermes‑style reasoning and tool‑calling annotations. Generated by the DeepSeek V4 Pro API, the traces are organized into train (16,431), validation (1,933), and test (967) splits and are stored in optimized Parquet format for efficient loading with libraries such as pandas, polars, and the Hugging Face `datasets` library.

Four VRAM‑tiered variants are provided: **nano** (2,048‑token limit, 15,948 traces, suitable for 7B models), **budget** (4,096 tokens, 2,149 traces, 48 GB GPU), **standard** (8,192 tokens, 990 traces, 64 GB GPU), and **spark** (16,384 tokens, 244 traces, 128 GB DGX). Each trace includes calls to 138 K tool invocations across core tools (e.g., terminal, read_file, web_search) and Hermes‑specific tools (e.g., memory, delegate_task). The dataset is licensed under Apache‑2.0 and targets LoRA fine‑tuning of local models to act as Hermes agents.

The README lists several target models for LoRA adaptation, including Qwen 3.6 27B, Nemotron Omni 30B, Ling‑2.6‑flash, Mistral‑Medium‑3.5 128B, GLM‑5.1, and DeepSeek V4 Flash. Generation was performed with 96 parallel workers, a 5‑second stagger, and a JSON repair filter that achieved a 62 % pass rate, ensuring high‑quality, fully‑structured ChatML blocks. This dataset is particularly valuable for researchers and developers aiming to equip LLMs with reliable tool‑calling and step‑by‑step reasoning capabilities.

Project Ideas

Fine‑tune a 7B LLM with the nano variant to create a lightweight Hermes‑style agent that can invoke terminal and file‑reading tools.
Benchmark the tool‑calling success rate of different LoRA‑adapted models using the validation split as a held‑out evaluation set.
Build a multi‑turn chatbot demo that reproduces the recorded reasoning traces, showcasing how the model decides to use core and Hermes tools.
Create a data loader that streams the Parquet files directly into a LoRA training pipeline with PyTorch Lightning for efficient large‑scale fine‑tuning.
Analyze the distribution of tool usage across the dataset to design a curriculum learning schedule that emphasizes less‑frequent Hermes tools like cronjob or delegate_task.

← Back to all reports