news
ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM
May 27, 2026
Artificial Analysis and IBM introduced ITBench-AA, the first benchmark for agentic enterprise IT tasks, and reported that frontier models scored below 50% on it. The result suggests current models still struggle with realistic enterprise IT workflows, making the benchmark a useful stress test for deployment readiness.
Source: huggingface.co