04 · Agentic
Overview#
When I say “agentic,” I mean a system property: the system selects and executes actions over time in pursuit of an objective, using tools under constraints.
This is a qualitative change from single-turn assistance because it changes how errors propagate.
- In non-agentic settings, a mistake typically manifests as a bad output that can be rejected.
- In agentic settings, a mistake may alter state, trigger downstream actions, and become harder to attribute after the fact.
Agentic systems sit between the Flywheel (feedback loops) and Helix (structural effects) because they compress iteration cycles. As actions become faster and cheaper to attempt, governance and observability become the constraints that bind.
Required distinctions#
These distinctions keep claims falsifiable and help prevent category drift:
- Automation vs autonomy:
- Automation: executing a predefined procedure with fixed control flow.
- Autonomy: selecting actions or sequences based on intermediate results and changing context.
- Tool invocation vs goal-directed behavior:
- Tool invocation: calling an API or function as a substep.
- Goal-directed behavior: choosing which tools to call, in what order, with what state updates, to satisfy constraints.
- Stateless execution vs stateful systems:
- Stateless: each run is independent; failure does not accumulate.
- Stateful: the system retains memory (explicit or implicit) and can change the environment; failure may compound.
- Bounded agency vs open-ended agency:
- Bounded: a narrow action space, clear success criteria, measurable outcomes, and enforced budgets/permissions.
- Open-ended: broad permissions, underspecified success, weak measurement, and unbounded horizon.
This work focuses on bounded agency because it is the regime where reliability and governance can plausibly scale.
Operational definition (as used here)#
An agentic system is one that:
- Maintains state across steps (memory, scratch state, retrieved context, or environment state).
- Selects actions over time (tools, messages, API calls, code execution) based on intermediate results.
- Operates under explicit constraints (permissions, budgets, policies) that can be audited.
- Has an evaluation surface that determines whether actions are accepted (tests, checks, human review, monitoring).
Under this definition, “agentic” does not imply general intelligence. Control flow is partially learned or decided at runtime rather than fully specified upfront.
Enabling conditions#
Agentic approaches become viable when technical and organizational conditions are present.
Technical#
- A stable tool interface:
- Tools have predictable schemas, idempotent operations where possible, and explicit error modes.
- Constrained permissions:
- The system’s action scope is deliberately narrower than the environment’s full capability.
- Observability:
- Tool calls, intermediate state, and outputs are logged with enough context to support incident response.
- Evaluation harness:
- There is a way to score multi-step behavior, not just single outputs.
Organizational#
- Clear accountability:
- Someone owns the workflow’s correctness, security posture, and rollback procedures.
- Defined error budgets:
- The organization can specify acceptable failure rates and the cost of failure.
- A change process:
- There is a path from observed failure to system update (policies, tests, tool constraints, prompts, or model choice).
Constraints and limits#
Agentic systems break down (or become uneconomical) under common conditions:
- Weak success criteria:
- If success cannot be specified or measured, the system cannot reliably learn which actions are correct.
- High-cost irreversible actions:
- If rollback is hard and mistakes are expensive, autonomy should be limited or shifted back to human approval.
- Tool boundary brittleness:
- If the environment changes frequently (schemas, permissions, data shapes), reliability often collapses unless the interface is engineered for change.
- Sparse feedback:
- Without frequent, high-quality signals, the system’s policy becomes guesswork.
In practice, these limits show up as an integration ceiling: the model may appear capable, but the system cannot be governed at the required reliability.
Key points#
- Tool selection and permissions matter.
- Memory is a liability unless bounded.
- Evaluation must include long-horizon tasks.
Failure modes (common)#
- Compounding error:
- A small mistake early in a sequence causes later steps to be “correct” relative to an incorrect state.
- Hidden state:
- The system’s memory or intermediate state affects actions in ways that are not logged, reviewed, or reproducible.
- Brittle tool boundaries:
- Ambiguous schemas, partial failures, or non-idempotent operations cause actions that cannot be safely retried.
- Unobserved delegation:
- Subtasks are pushed to tools or external services without capturing what was done, why, and with what evidence.
- Human-in-the-loop illusions:
- A workflow appears autonomous but depends on untracked human correction, silently inflating reliability.
What this section is NOT asserting#
This section is not asserting:
- That autonomy is broadly desirable across domains.
- That agentic systems replace humans as a default.
- That tool use implies goal understanding.
- That long-horizon reliability is solved.
TODO: Add a short taxonomy of “bounded agency” levels used in your environment (read-only tools; write tools with approvals; write tools without approvals) and map them to error budgets and required observability.