AI Agents
6 articles tagged with this topic
Stop Chasing Leaderboards: How Berkeley Exposed Flawed AI Agent Benchmarks
Berkeley researchers reveal critical data contamination in top AI benchmarks. Learn how to validate your own agent tools, avoid overfitting, and build
Harness Engineering Emerges as Core AI Agent Discipline at OpenAI, Anthropic
OpenAI and Anthropic publicly frame 'Harness Engineering' as the critical layer between model capability and production reliability.
IBM ALTK-Evolve Enables AI Agents to Learn During Deployment
IBM Research releases ALTK-Evolve, a toolkit letting AI agents update their behavior from real task experience without full retraining.
How Meta Built a Pre-Compute Engine to Give AI Agents a Codebase Map
Meta deployed 50+ specialized AI agents to encode tribal knowledge across 4,100+ files, cutting agent tool calls by 40%.
Amazon Bedrock AgentCore Gateway Now Supports OAuth 2.0 for MCP Servers
AgentCore Gateway centralizes MCP server auth using OAuth 2.0 Authorization Code flow, removing per-server credential management.
Claude Opus 4 Fails Elden Ring: A Reality Check on AGI Claims
A developer tested Claude Opus 4 on Elden Ring gameplay. It couldn't leave the first room, challenging Jensen Huang's AGI claims.