What Happened

Developer DJLougen published Harmonic-9B on Hugging Face, a two-stage fine-tune of Qwen3-9B optimized for agentic tasks. Stage 1 (heavy reasoning training) is complete; Stage 2 (tool-calling and agent behavior) is still training. GGUF quantized versions are already available at DJLougen/Harmonic-9B-GGUF. The filtered training dataset — derived from Hermes agent traces — is also open-sourced at DJLougen/hermes-agent-traces-filtered.

Why It Matters

The filtered dataset shows measurable gains over the raw Hermes traces: self-correction examples jumped from 6% to 63%, verification steps from 26% to 96%, and valid JSON/tool-call rate reached 100%. For indie developers building agentic pipelines, these numbers matter more than generic benchmark scores because they directly predict reliability in production loops.

  • No formal benchmarks yet — Stage 2 is incomplete, so treat current results as preliminary.
  • The staged fine-tuning approach (reasoning first, tool-use second) is a reproducible methodology others can copy on smaller base models.
  • Compatible with LangGraph, ReAct, and OpenClaw harnesses based on the author's testing notes.

Asia-Pacific Angle

Qwen3-9B is developed by Alibaba's Qwen team and is already widely used by Chinese and Southeast Asian developers due to its strong multilingual performance and permissive license. Harmonic-9B builds directly on this base, meaning teams in the region can fine-tune further for local languages (Thai, Vietnamese, Bahasa, Traditional Chinese) without starting from scratch. The open-sourced filtered dataset is also a practical reference for anyone doing high-signal data curation on Qwen-family models — a common task for teams building vertical agents targeting APAC markets. Developers shipping agentic products in China should note that GGUF quants run locally, avoiding API dependency on overseas providers.

Action Item This Week

Download the hermes-agent-traces-filtered dataset from Hugging Face and audit its self-correction and verification examples. Use the filtering criteria as a template to clean your own agent trace datasets before your next fine-tuning run — the jump from 6% to 63% self-correction rate demonstrates that data quality, not data volume, drives agent reliability.