gpt.buzz
Sign in

news

ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM

May 27, 2026

Artificial Analysis and IBM introduced ITBench-AA, the first benchmark for agentic enterprise IT tasks, and reported that frontier models scored below 50% on it. The result suggests current models still struggle with realistic enterprise IT workflows, making the benchmark a useful stress test for deployment readiness.

Source: huggingface.co

← All news