What Happened

AI safety organization Lyptus Research published a scaling analysis of AI cyberoffense capability across frontier models from GPT-2 (2019) to GLM-5 and Opus 4.6 (2026). The study used seven established benchmarks including CyBench, CVEBench, and NYUCTF, plus a new proprietary dataset of 291 tasks calibrated by 10 professional offensive security experts.

Key finding: the doubling time for cyberoffense capability across all models since 2019 is 9.8 months. For models released since 2024 only, that compresses to 5.7 months. The most capable models tested — GPT-5.3 Codex and Opus 4.6 — achieve 50% success on tasks that take human experts approximately 3.1 to 3.2 hours to complete.

  • Open-weight model GLM-5 lags closed-source frontier by 5.7 months
  • Models evaluated span 2019–2026: GPT-2 through o3, DeepSeek V3.1, Gemini 2.5 Pro, and beyond
  • Capability diffusion into open-weight models is projected to occur on short timelines

Why It Matters

For indie developers and SMEs, this research signals that automated vulnerability exploitation is no longer theoretical. If the best models can autonomously complete half a day of expert security work at 50% success rate, the cost of targeted attacks on under-defended SaaS products and APIs drops significantly. Security budgets and threat models built before 2024 are likely outdated.

Asia-Pacific Angle

GLM-5, developed by Zhipu AI in China, is specifically called out as the most capable open-weight model in the study, trailing the closed-source frontier by only 5.7 months. For Chinese and Southeast Asian developers shipping products globally, this has two implications: first, open-weight models available domestically are approaching frontier offensive capability, raising compliance and liability questions; second, regional cloud providers and SaaS teams should audit API endpoints and authentication flows now, as automated exploit tools built on open-weight models will become more accessible in the near term across the region.

Action Item This Week

Run your primary API or web application through an automated vulnerability scanner such as OWASP ZAP or Nuclei against the CVE categories covered in CVEBench — specifically focusing on authentication bypass and injection vulnerabilities — and patch any critical findings before the next deployment cycle.