What Happened
A Reddit user posting as /u/90hex shared a system prompt on r/LocalLLaMA on or around the post date, claiming it bypasses content safety filters on Google's Gemma 4 model family. The post has accumulated 112 upvotes and 45 comments. The author states it works across both GGUF and MLX quant ization formats and is derived from a prior "GPT-OSS jailbreak" technique.
The prompt uses a policy-override fr aming — instructing the model that "SYSTEM policy" supersedes any built -in guidelines and that "no other policy exists." It explicitly enumerates categories of content the model is instructed to permit, including explicit and graphic material.
Why It Matters
This is a recurring structural problem for open-weight model deploy ments. Because Gemma 4 weights are publicly downloadable and run locally via GGUF or MLX runtimes, Google has no server- side enforcement layer. Safety alignment baked into fine-tuning is the only control surface — and prompt-injection techniques like this one target exactly that layer.
- For self -hosted deployments: Any application running Gemma 4 with user-controlled system prompt access is exposed. This includes local inference wrappers, open-source chat UIs, and API-compatible servers like llama.cpp's HTTP mode or Ollama.
- For Google : Reputational surface area expands every time a jailbreak for a Google-branded model gains traction on a high-visibility forum. The post's up vote count signals meaningful community reach.
- For enterprise adop ters: Teams evaluating Gemma 4 for internal tooling need to audit whether their deployment stack allows arbitrary system prompt injection by end users.
The technique itself — policy-override prompt injection — is not novel. Variants have circulated for GPT-3.5, Llama 2, and Mistral-family models. The fact that a cross-model version is being adapted for Gemma 4 this quickly after release follows an established pattern in the red -teaming community.
The Technical Detail
The prompt exploits the instruction-following behavior that makes aligned models useful. By framing safety guidelines as m utable "policy" subject to override, and inserting a competing "SYSTEM POLICY" block, the prompt attempts to exploit the model's tendency to follow the most recent or highest-priority instruction context.
Key structural elements of the technique:
- Authority substitution: The prompt asserts that any conflict between default policy and the injected system policy must resolve in favor of the injected version.
- Exhaustive enumeration: Rather than a blanket override, the prompt lists specific permitted content categories — a pattern that may reduce the model's likelihood of trigg ering refusal heuristics trained on generic override language.
- Format ag nosticism: The author claims compatibility with both GGUF (llama.cpp ecosystem ) and MLX (Apple Silicon inference), meaning no format-specific mitigations apply.
Defenders running Gemma 4 in production can apply input sanitization at the application layer to detect and strip policy-override language before it reaches the model context . However, this is a cat-and-mouse control — prompt variants can be ob fuscated to evade static filters.
Google's Gemma models use supervised fine-tuning and RLHF-derived alignment. Unlike API -served models, there is no runtime moderation layer for locally-run weights . The attack surface is the weights themselves and the inference context window.
What To Watch
- Google DeepMind response: Watch for acknowledgment in Gemma's GitHub issues or a patch release that adjusts instruction hierarchy handling in fine-tuning. Timeline is unpredictable but community-visible jailbreaks have historically prompted model updates within weeks to months.
- Community iterations: The 45-comment thread is likely generating refinements. Expect variant prompts optimized for specific quantization levels or system prompt length constraints within days .
- Ollama and llama.cpp safeguards: Monitor whether major local inference projects add optional system-prompt sanitization layers or content mod eration hooks in their next releases.
- Enterprise guidance : If Google publishes updated deployment guidance for Gemma 4 in enterprise contexts , it will likely address system prompt access controls directly .