Qwen 3.5 Tool Calling Bugs: What's Broken and How to Fix Them

What Happened

A LocalLLaMA community member documented four reproducible bugs in Qwen 3.5 tool calling after hundreds of log analysis sessions using llama.cpp, Ollama, and vLLM. The findings were synthesized with Claude Opus 4.6 and validated against live servers. The specific stack that achieved 99% reliability: Pi coding agent + llama.cpp + Bartowski Q5_K_L quants.

Bug 1 – XML leakage: Qwen 3.5 emits tool calls as raw XML (<function=bash>). When text precedes the XML tag or thinking mode is enabled, servers return finish_reason: stop instead of parsing the call. The agent never executes the tool.
Bug 2 – Thinking block contamination: Tool calls emitted inside <think> blocks are invisible to the server parser. llama.cpp issue #20837 is still open.
Bug 3 – Ollama partial fix: Ollama issue #14745 patched some cases but still occasionally prints tool calls as plain text in streaming mode.
Bug 4 – vLLM streaming drops opening brace: vLLM issue #35266 causes malformed JSON tool calls during streaming, breaking downstream parsers.

Why It Matters

Qwen 3.5 is one of the most capable open-weight model families for coding agents and function-calling pipelines, but these bugs make it unreliable in production agentic loops without workarounds. Indie developers building coding assistants, browser agents, or API orchestration tools on local inference will hit these failures silently — the model appears to respond but no tool executes. The fix requires both server-side patches (some still pending) and client-side prompt engineering.

Asia-Pacific Angle

Qwen 3.5 is developed by Alibaba Cloud and is the dominant open-weight choice for Chinese and Southeast Asian developers due to its strong multilingual performance and permissive licensing. Teams in China, Vietnam, Indonesia, and Singapore building local-first AI agents — often to avoid OpenAI API costs or data residency issues — are disproportionately affected by these bugs. The recommended stack (llama.cpp + Bartowski quants + Q5_K_L quantization) runs on consumer hardware common in the region. Developers using Nano-GPT or similar lightweight inference servers should apply client-side XML parsing patches immediately, as server-side fixes are not yet merged upstream.

Action Item This Week

If you run Qwen 3.5 with llama.cpp, pin to a Bartowski Q5_K_L quant, disable thinking mode during tool-calling loops, and add a client-side parser that detects raw <function= XML output and re-routes it as a tool call — do not rely on finish_reason: tool_calls alone until llama.cpp issues #20260 and #20837 are closed.

Qwen 3.5 Tool Calling Bugs: What's Broken and How to Fix Them

What Happened

Why It Matters

Asia-Pacific Angle

Action Item This Week

Related Reading

Build Your Own Mini Cursor: Why This Tutorial Deser ves More Attention

Goldman Sachs Warning : S &P 500 Now Half an AI Index

DeepSeek V4 Launches: Claims Global Open- Source Leadership

GPT- 5.5 Tops Every Benchmark, Edges Out Opus 4.7 — OpenAI Strikes Back

GP T-5.5 Launches : Is Claude Being Pushed Out of China ?

AI Keeps Forg etting Half Your Docs? DeepSeek Now Reads a Full Book at Once