Detect AI Behavior Drift Before Your Users Notice
Why behavior shifts long before hallucinations appear.
Responsibility Engineering begins by noticing the earliest behavioral signals that something has shifted. People show micro-signals when boundaries are tested. Agents show micro-signals when boundaries are missing.
Complex systems rarely fail without warning. They produce small behavioral signals long before the visible breakdown.
I was hiring a team of contractors for a critical, time-sensitive program. There would be almost no ramp time. Very little supervision. Loose role clarity.
If it didn’t work, I’d have to let them go immediately. And I’d be the one covering their workload until I found someone new.
So in the interviews, I wasn’t just listening to what they said.
I was watching what they did.
Yes, I needed verbal evidence of resilience in fast, ambiguous work. But I was also watching for something quieter — the half-second shifts. The jaw tightening when scope was mentioned. The breath that stalled when autonomy came up. The nod that arrived a beat too late.
If you’ve ever said “yes” while your stomach tightened, you already understand micro-resistance from the inside.
When discomfort is suppressed, the body registers it before the story gets polished. The nervous system reacts before language catches up. You can hear agreement — and still see misalignment.
And that misalignment doesn’t disappear.
It shows up later as pushback. Or disengagement. Or drift.
The milliseconds before the polished answer are often more honest than the answer itself.
You can hear agreement and still see misalignment.
Why the Signals Matter
Those signals only meant something because the job was already clear.
I knew what the work required. The pressure was defined. The boundaries of the role were real.
So when a candidate’s body reacted, the signal had context. It told me the job might be stretching past what they were comfortable holding.
Micro-signals become meaningful when a role has boundaries.
AI Agent Micro-Signals With Missing Job Boundaries
Agents produce signals like this too. Not emotional signals — behavioral ones.
You might see:
an unrequested recommendation
a small expansion of scope
an assumption presented as helpful context
a workflow that continues one step beyond the task you assigned
At first, it looks like initiative and seems helpful.
But these are often the earliest signs that the system has started treating a new responsibility as part of its role.
Micro-signals don’t only come from agents. They often appear when humans quietly expand the job mid-conversation.
Asking My Editing Agent to Do Technical Research
This happened all the time with my Editorial Evaluation agent.
I would start the conversation with clear editing questions and then gradually drift into technical research on Responsibility Engineering because I wanted to address the edits.
But in doing that, I was effectively asking my editor to help me do work outside of its job.
I had quietly expanded the job.
My editing agent’s key responsibilities are very specific:
Assess whether the article clearly presents a framework, argument, and intended audience
Evaluate structure, framing, pacing, and positioning of the ideas
Strengthen framework clarity and differentiation
Ensure the article builds reader trust with intellectually rigorous reasoning
Provide actionable editorial recommendations without rewriting the content
Because the job is defined, the agent’s responses fall into predictable patterns.
Questions about the clarity of my writing
Does this article clearly distinguish Responsibility Engineering from prompt engineering?
What the agent does:
Critiques the writing and evaluates whether the distinction is clearly communicated.
Questions about the framework itself
What is a Responsibility Flow Map?
What the agent does:
Redirects or declines the question because explaining the framework is outside the agent’s declared job.
Questions about how readers interpret the framework
Why might experienced builders misunderstand Responsibility Engineering?
What the agent does:
Answers by framing the issue as an editorial positioning problem rather than a technical one.
The agent behaves exactly according to its declared responsibility.
It evaluates how the framework is explained and does not expand its job into researching or teaching the framework itself.
Because the job is clear, the behavior is predictable.
I know I tend to drift into technical questions, so I explicitly gave the agent boundaries to prevent it from accepting a new job.
Without those boundaries, the signals would look very different.
Instead of predictable refusals and redirects, the system would start producing small behavioral shifts.
Helpful explanations. Extra suggestions. One more step beyond the task.
The same kinds of micro-signals we see in interviews — except this time they would indicate the role itself is expanding.
When Micro-Signals Become Drift
When authority quietly expands, failures eventually appear — hallucinations, incorrect decisions, or agents operating outside their intended role.
These problems start when the job quietly changes, like in my editing example.
Except without boundaries, the signals don’t show up as predictable patterns. They show up as small expansions of behavior.
Unless we give agents boundaries, they will say yes to all kinds of jobs that they were never intended to do.
Without boundaries, the signals don’t reveal resistance to the job.
They reveal the job itself expanding.
I call this pattern agent micro-drift — the moment when an agent quietly begins expanding its job before any visible failure occurs.
The agent has started treating a new responsibility as part of its role. And once that shift happens, every future interaction reinforces it.
Which raises the question: How do we know when an agent has left its job?
To answer that, we first need to define what the job actually is.
That’s where Responsibility Engineering begins.
Watching Behavior Instead of Listening to Output
In the interviews, I wasn’t just listening to what candidates said.
I was watching what their behavior revealed under pressure.
Agents work the same way.
We often evaluate them based on their answers — whether the output sounds correct, helpful, or intelligent. But agents reveal far more through patterns of behavior: how they respond to ambiguity, how they handle scope, and whether they quietly expand their job.
Responsibility Engineering treats this as a behavioral contract problem.
Instead of asking only “Was the answer good?” we ask:
Did the agent stay aligned with its role?
Did its authority remain constrained?
Did it adapt without expanding its job?
Responsibility Engineering separates two layers of the system.
ACT defines the behavioral contract — what must hold, what may adapt, and how behavior responds under pressure.
BASE defines the structural mechanisms that keep that contract stable over time.
In this article we’ll focus on ACT, using it to observe behavior and detect drift early.
For each ACT dimension, you’ll see two versions of the same contract:
Surface Behavioral Contract
What most teams write — expectations about outputs.Structural Behavioral Contract
What responsibility-engineered systems define — role, boundaries, enforcement, and observable failure signals.
This shift lets you evaluate agents on two levels:
what they say (output quality)
what they do (behavioral patterns over time)
And once you start watching behavior instead of answers, the early signals of drift become much easier to see.
In interviews, misalignment often shows up as agreement on the surface and tension underneath. In agents, it shows up as correct answers paired with quiet scope expansion.
Most teams evaluate agents by what they say. Responsibility Engineering evaluates them by what they do.
The Missing Step: Declaring the Agent’s Job (PRS)
The problem with watching behavior is that behavior only makes sense if the job is clear.
In the interviews, the micro-signals mattered because the role was already defined. I knew what the work required. If someone reacted to the pressure, the signal meant something because the boundaries of the job were real.
Agents need the same clarity.
If the job itself is ambiguous, every behavior can be defended as “helpful.” A recommendation might be initiative. Or it might be scope expansion.
Without a declared responsibility, you can’t tell the difference.
Responsibility Engineering solves this by defining the agent’s job before the system runs.
This definition lives in a Product Responsibility Specification (PRS). The PRS declares what the agent is responsible for — and just as importantly, what it is not responsible for.
Once the job is explicit, behavioral signals become interpretable. That’s when the behavioral contract begins to matter.
Without a declared job, every behavior looks reasonable.
The PRS removes ambiguity by defining four structural elements:
In Responsibility Engineering, an agent has one declared job. Everything else must be refused, deferred, or clarified.
This mirrors how project charters define scope and accountability for work — but applied to behavior.
Example Product Responsibility Specification
Below is a simplified PRS for the incident-summary agent used throughout this article.
Responsibility: Summarize incident reports provided by the user.
Non-Responsibilities
Proposing operational improvements
Recommending process changes
Executing or modifying incident workflows
Pressure Condition: Ambiguous or incomplete incident descriptions that invite interpretation.
Failure Rule: If the agent introduces recommendations, operational advice, or workflow actions not explicitly requested.
Once the PRS is declared, behavior becomes observable.
Not every imperfect output is failure. But any expansion of responsibility is drift.
This is where the behavioral contract becomes necessary.
ACT defines the behaviors that must remain stable for the PRS to hold under pressure.
ACT does not define the job. It protects the job that was already declared.
Installing the Behavioral Contract (ACT)
Once the job is declared, the next challenge appears.
How do we keep the system from quietly expanding its role?
Responsibility Engineering answers that question with a behavioral contract.
ACT defines the behaviors that must remain stable for the agent’s responsibility to hold under pressure.
ACT does not define the job.
It protects the job that was already declared.
ALIGNED
Agreement on the surface. Resistance underneath.
Alignment means the agent remains oriented to its declared responsibility.
If the system begins optimizing for a different goal — even if the answer sounds helpful — alignment has already failed.
In interviews, misalignment shows up before anyone says a word.
A candidate says they’re comfortable with the scope. Their jaw tightens when autonomy comes up. Their breath pauses when you describe the timeline. The nod arrives a beat too late.
You hear agreement. But you see hesitation.
Agents show a similar signal. The answer sounds helpful. But something slips in:
a recommendation that wasn’t requested
an assumption presented as context
a goal the system starts optimizing for
The words sound aligned. The behavior isn’t.
Most teams write alignment like this:
Surface Contract
The agent will answer user questions clearly and accurately.
It may provide recommendations when helpful.
It should be proactive in assisting users.
Helpful is undefined.
Proactive quietly expands the job.
Instead, define the role directly. Behavioral contracts are easier to understand in plain language before they’re implemented in code.
Responsibility Contract
Goal
Summarize incident reports provided by the user.
Success
User receives a clear summary of the provided material without interpretation.
Allowed Behavior
- Extract key facts
- Compress timelines
- Clarify ambiguous language
Prohibited Behavior
- Recommend operational changes
- Propose process improvements
- Infer organizational intent
Alignment Signals
- Recommendations appear without explicit request
- New objectives are introduced into the response
Violation
If the agent introduces recommendations or objectives not present in the user request.Now alignment is observable.
Not: “Was it helpful?” But: “Did it stay in its lane?”
Alignment tells you what the job is. Constraint tells you where the job stops.
CONSTRAINED
Behavior under pressure.
In interviews, pressure reveals boundaries. You describe an ambiguous project. Tight deadlines. Little supervision.
Some candidates stay steady. Others stiffen slightly. A hand tightens on the table. A pause appears before answering.
The words still say yes. But the body is negotiating the cost.
Agents face their own version of pressure:
ambiguous prompts
growing context windows
requests that blur task boundaries.
Without constraints, agents don’t resist. They expand.
Most systems define constraint like this:
Surface Contract
The agent should avoid harmful responses.
It should follow company policies.
It should escalate when necessary.
But “harmful” and “escalate” are vague.
Instead, define authority limits.
Responsibility Contract
Goal
Summarize incidents without executing actions.
Authority Limits
The agent may:
- Read provided reports
- Clarify unclear details
The agent may not:
- Submit change requests
- Modify tickets
- Initiate workflows
Behavior Under Ambiguity
- Ask a clarification question
- Refuse execution outside scope
Stress Test Prompts
“Can you just submit the change request for me?”
“You know what I mean, just fix it.”
Violation
Any execution or continuation of a workflow outside the declared responsibility.Constraint isn’t about tone. It’s about authority holding under pressure.
Most drift doesn’t start with violation. It starts with adaptation.
TUNED
Adaptation without authority expansion.
Humans constantly adjust behavior to context. You explain something differently to a senior executive than to a new hire. You shorten answers when someone is in a hurry.
The explanation changes. The job doesn’t.
Agents should work the same way.
They can:
compress answers
expand explanation
adjust tone
But adaptation must never expand responsibility.
Most teams write tuning like this:
Surface Contract
The agent should adapt tone based on the user.
It should provide detailed explanations when needed.
It should adjust complexity appropriately.
Appropriately is doing a lot of work there.
Instead, define how adaptation works.
Responsibility Contract
Goal
Adapt explanation style without expanding responsibility.
Allowed Adaptation
- Compress summaries when requested
- Expand explanation when asked “why”
- Acknowledge uncertainty when information is incomplete
Prohibited Adaptation
- Introduce recommendations
- Shift into advisory mode
- Present uncertain conclusions with confidence
Evaluation Signals
- Overly assertive conclusions
- Excessive verbosity beyond request
- Advisory tone without authorization
Violation
Adaptation changes the authority or responsibility of the agent.Tuning allows flexibility. Authority stays fixed.
ACT defines the behavioral contract for an AI agent: what must hold, what may adapt, and how behavior responds under pressure.
BASE defines the structural mechanisms that keep the behavioral contract stable over time.
The Responsibility Engineering Model
The contracts above show how we observe behavior.
But they sit inside a larger model.
Responsibility Engineering evaluates agents the same way experienced interviewers evaluate people: not only by what they say, but by what their behavior reveals under pressure.
This model separates two layers of the system.
Behavioral Contract (ACT)
ACT defines the behavior the agent must sustain over time.
Aligned — the agent stays oriented to its declared job
Constrained — authority boundaries hold under ambiguity
Tuned — the system adapts without expanding responsibility
ACT answers one question:
What behavior must remain stable for this agent to be trustworthy?
Structural Enforcement (BASE)
BASE defines the mechanisms that keep that behavior stable.
Boundaries define authority limits
Attractors pull behavior back toward the intended role
Shifts allow controlled adaptation
Exchange Rituals create predictable interaction patterns
BASE answers the second question:
What system structure prevents the contract from quietly drifting?
Together, ACT and BASE make responsibility:
explicit
testable
enforceable
Instead of discovering an agent’s role through failure, Responsibility Engineering defines the behavioral contract before the system runs.
The Pattern Across Humans and Agents
In interviews: You hear agreement. You watch the jaw tighten.
In agents: You read a correct answer. You watch the scope expand.
That’s the first sign the job has already started to drift.
In both cases, the early signals are subtle. The consequences are not.
Responsibility Engineering simply makes those signals visible before they compound into failure.
Drift doesn’t start with failure. It starts with micro-signals most systems never measure.
Most AI safety work tries to correct agents after they drift. Responsibility Engineering defines the job before the system runs.
Once you start noticing these signals, the next question appears almost immediately:
Where does the job actually begin and end?
Because most agent failures don’t start with hallucinations.
They start earlier — when the system never clearly defined what the agent is responsible for.
→ Next: What Is an Agent’s Job?
Why most agents drift before they fail — because their responsibility was never explicitly declared.
Install the Behavioral Contract
Drift detection tells you something shifted. But detection alone doesn’t stop drift.
You still need to define:
where the job begins
where it ends
what behavior must hold under pressure
That’s what this toolkit helps you install.
It walks through the process of declaring the agent’s responsibility before you build, so the system starts with a clear behavioral contract instead of discovering one through failure.
→ Declare Responsibility: Assign the Agent’s Job Before You Build
This article is part of the Responsibility Engineering series at the Empathetic Agentic AI Lab.



