What Happened

Machine learning researcher and author Sebastian Raschka published a detailed reference article on the architecture of coding agents, targeting readers of his Build a Large Language Model (From Scratch) and Build a Large Reasoning Model (From Scratch ) books. The piece focuses on six core building blocks that distinguish a coding agent from a plain L LM chat interface, using Claude Code and OpenAI's Codex CLI as primary examples.

The central thesis: tools like Claude Code feel more capable than raw model access not because the underlying model is different, but because the surrounding system — what Raschka calls the "agentic harness" — handles repo context, tool use, prompt-cache stability, memory, and long-session continuity.

Technical Deep Dive

LLMs vs. Reasoning Models vs. Agents

Raschka draws a three-way distinction that practitioners frequently collapse into a single concept:

  • LLM : The core next-token prediction model (e.g., GPT-4o, Claude 3.5 Sonnet )
  • Reasoning model: An LLM fine-tuned or prompted to produce extended chain-of-thought traces before emitting a final answer (e.g., o 3, Claude 3.7 Sonnet with extended thinking)
  • Agent: Either model type wrapped in an application layer that adds tool use, context management, and memory

This framing matters for benchmarking. When SW E-bench scores are reported, the number reflects the full agent stack , not model capability in isolation. A weaker model inside a well -engineered harness can outperform a stronger model queried directly.

The Six Building Blocks

While the full article details all six components, the structural categories Raschka identifies map onto standard agent design patterns:

  • Repo context management: Coding agents must efficiently load relevant file content into the context window without exceeding token limits. This often involves retrieval over the local codebase rather than naive full-file injection .
  • Tool design: Agents expose discrete tools — file read /write, shell execution, test runners — that the LLM can call via structured output. Claude Code, for instance, uses a bash tool and a file -edit tool as primary primitives.
  • Prompt-cache stability: For cost and latency, the system prompt and static context must remain byte-identical across turns to hit Anthropic's or OpenAI's prompt caching layers. Dynamic content is appended at the end.
  • Memory: Distinguishing ephemeral in-context memory from persistent storage (e.g., a CLAUDE.md file or a vector store) that survives session re sets.
  • Long-session continuity: Managing context window growth over multi-hour sessions, including summar ization or compaction strategies when the window fills.
  • Reasoning behavior: Deciding whether to use a fast non-thinking model for tool calls and a slower reasoning model for planning steps , or a unified model for both.

A gentic Harness Pattern

The harness pattern Raschka describes is a loop: the LLM receives a task, emits either a tool call or a final answer, the harness executes the tool and appends the result to context , and the loop repeats. A minimal implementation in pseudocode looks like:

while not done:
    response = llm.complete(messages)
    if response.tool_calls:
        for call  in response.tool_calls:
            result = execute_tool(call)
            messages.append(tool_result(result))
    else:
        done  = True
        return response.content

Claude Code and Codex CLI both implement this loop but add substantial engineering around context compaction, permission gating for destructive tool calls, and streaming output to the terminal.

Who Should Care

This article is directly relevant to three groups. Engineers building internal coding tools will find the component breakdown useful for scoping what to implement versus what to delegate to a hosted agent. ML practitioners evaluating agent frameworks (LangGraph, OpenAI Agents SDK, smo lagents) will benefit from understanding which components each framework handles versus which remain the developer's responsibility. Technical leads choosing between Claude Code, Codex CLI, Cursor, or Devin can use the six-component framework as an evaluation rubric rather than relying solely on benchmark scores.

The article is explicitly positioned as a reference document, meaning it is designed to be revis ited rather than read once.

What To Do This Week

  • If you are evaluating Claude Code or Codex CLI, audit which of the six components each tool exp oses for customization. Claude Code supports a CLAUDE.md file for persistent memory; check whether your repo has one configured.
  • If you are building an agent, profile your prompt-cache hit rate using Anthropic's or OpenAI's usage dashboards. Cache misses on a long system prompt can 3-5x your per-turn cost.
  • Read Raschka's full article at magazine.sebastianraschka.com for the complete six-component breakdown, especially the sections on tool design and reasoning model integration that the source excerpt truncates.
  • If you use LangGraph or the OpenAI Agents SDK, map each of the six components to the framework's abstractions to identify gaps your production code must fill manually.