Beyond the Ticket: Re-architecting AI for Outcomes

Every engineering manager knows the "Ticket Factory." For me, it’s the memory of my phone buzzing an hour after I fell asleep. The adrenaline wasn't just for the critical system failure; it was the panic to hit "Acknowledge" within the 10-minute window so the ticket wouldn't escalate.

That specific anxiety, prioritising the timer over the fix, is the hallmark of the Ticket Factory: a place where good intentions get buried under process, Service Desk requests and automated alerts.

In an effort to escape this trap, many organisations are rushing to "AI-enable" their workflows. The promise is seductive: use Large Language Models (LLMs) to categorise and route tickets automatically. But this approach rests on a fundamental category error. It attempts to apply a rigid, manual workflow to a fuzzy, probabilistic problem. We are using AI to perfect the art of answering the same questions faster, when the goal should be to learn enough to stop asking them.

This shift mirrors the evolution seen in cyber-security and high-frequency trading: moving from human-speed queues to machine-speed loops. While vendors are adding AI to optimise the ticket, the real value comes from optimising the learning. This approach aligns with industry movements toward autonomous "healing" loops seen in platforms like CrowdStrike's Security AI Workflow (SAW), yet it remains radically under-adopted in the broader service management landscape.

As Mik Kersten argues in Project to Product, focusing on "proxy metrics" (like number of tickets closed) distracts us from the only metric that matters: Flow Efficiency (Kersten, 2018). Optimising for ticket volume is like celebrating a high fever because your immune system is working hard. It misses the point: you are still sick. To truly leverage AI, we must move beyond the ticket and architect for outcomes.

The Flaw of the "Request-Response" Model

The traditional support workflow is reactive and linear: Ticket → Categorise → Queue → Escalate → Resolve → Close

Figure 1: The Ticket Factory: Linear & Reactive

The Ticket Factory: Linear & Reactive

This model assumes that work begins when a human complains. It treats engineers as "biological routers," manually moving data from one system to another. This is a primary driver of burnout; top talent doesn't stay in organisations that force them to be robots.

From Tickets to Signals

AI allows us to invert this from "Reactive" to "Predictive," but only if we change the underlying architecture of the work itself.

The Signal-Learning Loop: From Signal to Systemic Adaptation

My approach advocates for a semantic workflow architecture that replaces the 'Waiting Room' of tickets with actionable Signals. This synthesises proven patterns: the OODA Loop (Observe-Orient-Decide-Act) adapted for AI systems (Boyd, 2018), modern Agentic Workflows beyond zero-shot prompting (Ng, 2025), and Site Reliability Engineering (SRE) principles of observability over monitoring (Beyer et al., 2016). The result is a closed-loop system I call the Signal-Learning Loop.

Figure 2: The Signal-Learning Loop

The Signal-Learning Loop: From Signal to Systemic Adaptation

Modern platforms like MLflow 3.0 now enable this kind of closed-loop learning: by tracing every AI interaction (from prompt to output to user feedback) and linking it to versioned prompts and constraints, teams can automatically feed production signals back into system improvement (Databricks, 2025).

1. From Ticket to Signal

In Google’s Site Reliability Engineering practice, Monitoring tells you whether the system is working, but Observability lets you ask why it isn't (Beyer et al., 2016). Think of this like medical triage. A ticket is a patient describing a symptom; a Signal is the heart monitor detecting the problem before the patient even feels it.

The Old Way: A user logs a ticket saying "The dashboard is slow."
The New Way: The system emits a Signal (system vitals indicating a slowdown) correlated with a recent deployment.

2. From Queue to Hypothesis

Instead of waiting in a queue, the AI layer should immediately generate Hypotheses. By utilizing Chain-of-Thought reasoning (Wei et al., 2022), the AI leverages a "Semantic Layer." This uses the context of the system state to frame the problem (e.g., a potential resource constraint) before a human ever sees it. This prevents "ghost" alerts, notifications where no real issue exists, by grounding AI reasoning in actual system telemetry.

3. From Gate to Guardrail (The Decision Layer)

The bottleneck in most systems is the "Gate": the human who must approve every step. We move toward human-on-the-loop, where AI operates within pre-defined guardrails (Scharre, 2019).

Figure 3: Operational Models: The Gate vs. The Guardrail

Operational Models: The Gate vs. The Guardrail

Rather than a simple confidence score, we implement a robust decision architecture:

Deterministic rails: Code-based checks (Schema, Role-Based Access Control (RBAC)).
Semantic rails: Policy classifiers (Intent, Brand Safety).
Economic rails: Error budgets and Service Level Objectives (SLOs).

If the request stays within these bounds, the AI acts autonomously. The human monitors the loop (exceptions), not the ticket. Crucially, the AI is never given the "keys to the kingdom"; it is given keys to specific, reversible doors.

What makes an AI action "reversible"? In practice, this requires engineering patterns like idempotent operations with undo logs, state snapshots (e.g., etcd backups), or human-verified dry-run modes. These ensure autonomous actions can be safely rolled back without cascading failures.

However, I treat these guardrails as the "floor" for current operations, while acknowledging that higher-speed autonomy will eventually require deeper safety invariants (Perrow, 2000). (See Guardrails over Gates for the full architecture).

4. From Resolution to Learning

A "Closed" ticket is a dead end. In this framework, the Outcome must feed back into Learning. This learning doesn't just sit in a log; it updates the AI’s "Instruction Set" (System Prompts) and the "Source of Truth" (Knowledge Base) so the same signal never results in the same fire again.

Conclusion: Architecting for Value

Transforming your organisation for the AI era isn't about buying a "Copilot" for your service desk. It is about re-architecting your value stream.

We must stop building better Ticket Factories. We must start building systems that listen for Signals, form Hypotheses and treat every Outcome as an opportunity for systemic Learning.

References

Beyer, B., Jones, C., Petoff, J., & Murphy, N. R. (2016). Site Reliability Engineering: How Google Runs Production Systems. O'Reilly Media.

Boyd, J. R. (2018). A Discourse on Winning and Losing (G. T. Hammond, Ed.). Air University Press. https://www.airuniversity.af.edu/Portals/10/AUPress/Books/B_0151_Boyd_Discourse_Winning_Losing.PDF

CrowdStrike. (2024). Unlock SOC Transformation with CrowdStrike Falcon® Next-Gen SIEM. CrowdStrike Technical Whitepapers. https://assets.crowdstrike.com/is/content/crowdstrikeinc/unlock-soc-transformation-white-paperpdf

Databricks. (2025, June 11). MLflow 3.0: Unified AI Experimentation, Observability, and Governance. Databricks Blog. https://www.databricks.com/blog/mlflow-30-unified-ai-experimentation-observability-and-governance

Kersten, M. (2018). Project to Product: How to Survive and Thrive in the Age of Digital Disruption with the Flow Framework. IT Revolution Press.

Ng, A. (2025). Agentic AI Workflows. DeepLearning.AI. https://learn.deeplearning.ai/courses/agentic-ai

Perrow, C. (2000). Normal Accidents: Living with High-Risk Technologies (Updated ed.). Princeton University Press.

Scharre, P. (2019). Army of None: Autonomous Weapons and the Future of War. W. W. Norton & Company.

Wei, J., et al. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. Advances in Neural Information Processing Systems (NeurIPS). https://arxiv.org/abs/2201.11903