news

Improving instruction hierarchy in frontier LLMs

March 10, 2026

IH-Challenge trains frontier LLMs to better prioritize trusted instructions over conflicting ones, improving instruction hierarchy, safety steerability, and robustness against prompt injection attacks. It matters because instruction hierarchy failures are a common jailbreak vector, and the method is aimed at making models follow higher-priority directives more reliably.

IH-Challenge trains models to prioritize trusted instructions, improving instruction hierarchy, safety steerability, and resistance to prompt injection attacks.

Source: openai.com

← All news

Improving instruction hierarchy in frontier LLMs · gpt.buzz