The strongest agent systems I have built do not start with a question like "How many agents can we add?" They start with a workflow that is expensive, repetitive, slow, or difficult to audit. The architecture follows the work, not the trend.
Start with deterministic software
If a calculation can be done with code, I keep it in code. Validation, parsing, scoring rules, schema checks, data profiling, and permissions should not depend on a model unless the task genuinely requires judgment. This keeps the system faster, cheaper, and easier to debug.
Use agents for judgment and coordination
Agents become useful when the workflow needs context building, tool selection, synthesis, ambiguity handling, or multi-step reasoning. In analytics systems, that might mean deciding which chart explains a trend. In QA systems, it might mean reading a transcript and attaching evidence to a compliance score.
Make every agent observable
A production agent should leave a trail: input, selected tool, model, output schema, confidence, latency, token cost, and failure state. If a reviewer cannot understand why the system acted, the system is not ready for enterprise work.
Route models by task
Not every request deserves the biggest model. A dependable AI platform should route simple extraction and classification tasks to cheaper models, reserve stronger models for nuanced reasoning, and retry or fall back when provider behavior changes.
Keep humans in the workflow
The point of enterprise AI is rarely to remove every human decision. The better goal is to move people from repetitive work into review, exception handling, coaching, and decision-making. That is why I design agent systems with audit logs, evidence, redaction, and review states.
The standard I use
A multi-agent system is production-ready only when it can answer these questions clearly: what happened, why did it happen, what did it cost, how confident was the system, what happens when it fails, and where does a human review the result?