Platform mental models

Operational AI, engineered

This isn’t a chatbot. It’s an operational runtime: stateful execution, tool permissions, confidence-based routing, and replayable audit trails.

Confidence model Core runtime Toolpacks

Core

Core agent runtime

The primitives that make agents safe, predictable, and shippable in regulated operations.

State management

Agents aren’t stateless chats. We treat state as a first-class artifact so workflows can be paused, resumed, audited, and recovered.

Human-in-the-loop checkpoints

Review is a policy gate. The runtime can stop at defined checkpoints, request approval, and capture reviewer actions as evidence.

Orchestration

A workflow is a directed plan: branching paths, decision points, and explicit next actions—so outcomes land in queues, not in conversations.

Streaming

Stream partial outputs and intermediate state to UIs and downstream systems without waiting for a single “final answer”.

ReAct / plan execution

Plans are executed as steps (think: call a tool, validate, decide, route), not as a one-shot prompt.

Tool allow-lists + permissions

Agents only get the tools they’re allowed to use, with scoped permissions. This is how you control blast radius.

Branching / retries / timeouts

Reliability is built into the runtime: retry strategies, deterministic fallbacks, and timeouts that prevent runaway loops.

Audit logs / replay

Every step is replayable: inputs, outputs, tool calls, and approvals. Debugging becomes a timeline—not archaeology.

Capabilities

Toolpacks (MCP-style)

Composable tool collections: keep the agent small, give it reliable tools, and govern what it can do.

Document intelligence pack

Turn unstructured docs into structured artifacts.

extract_textclassify_documentsummarize_documentdetect_missing_fieldsextract_entitiescompare_documentshighlights_differencesconfidence_score

Workflow control pack

Create and route work in real queues.

create_taskassign_taskmark_task_completewait_for_human_approvalescalate_caseadd_noteadd_comment

Decision + eval pack

Score outputs, apply thresholds, and pick next actions.

evaluate_against_rubricscore_responseapply_thresholdsgenerate_explanationconfidence_bandrecommend_next_action

Comms pack

Prepare human-approved communications.

draft_emaildraft_lettergenerate_notificationprepare_summary_for_humanprepare_customer_facing_msg

Governance + compliance pack

Evidence, policy checks, and sensitive-data handling.

log_decisionrecord_justificationgenerate_audit_summarypolicy_checkredact_sensitive_data

Extensibility

Domain packs

Semi-generic (not fully bespoke): reuse internal tools, add domain vocabulary, rules, and templates.

Semi-generic, not bespoke

Domain packs are reusable building blocks, not one-off scripts. If you build the same thing twice, it belongs in a core or domain pack.

Reusable primitives + domain language

We reuse the same core runtime and tool primitives, then layer domain-specific vocab, rules, and templates to reduce time to deployment.

Examples (public-safe)

Healthcare operations • Pharma/PBM workflows • Legal operations • Education workflows

Client-specific layer

Bespoke integrations and customer-specific tools belong in the client layer. Rule of thumb: services build on the platform, never around it.

Mental models

How we keep agents safe, predictable, composable

We don’t trust vibes. We ship systems with contracts.

The simple model

Agents make decisions
Confidence decides trust
Orchestrator decides flow
Humans step in when trust is low

ReAct orchestration (example)

AgentPlan (ReAct Orchestrator)
  invoke AtomicAgent: DocumentTypeDecision
  if BILL:
    invoke AtomicAgent: MedicalBillExtraction
    invoke AtomicAgent: PaymentEligibilityDecision
  if REPORT:
    invoke AtomicAgent: ClinicalSummaryAgent

Trust

Confidence model: structured output + routing

Agents must emit structured output. The orchestrator routes based on confidence bands and status, with human gates where needed.

Agent response contract

{
  "status": "SUCCESS | NEEDS_HUMAN | FAILED",
  "confidence": 0.87,
  "confidence_breakdown": {
    "self_consistency": 0.9,
    "tool_grounding": 0.8,
    "validation": 1.0,
    "coverage": 0.7
  },
  "requires_human": false,
  "outputs": { ... },
  "artifacts": { ... },
  "trace_id": "..."
}

The contract makes outputs composable: downstream systems don’t need to parse prose to understand state, risk, or next actions.

Default orchestrator behavior

Confidence range

Default behavior

≥ 0.85

Auto-proceed

0.60 – 0.85

Soft review / spot check

< 0.60

Mandatory human review

FAILED

Escalate or retry

Confidence is not an abstract score. It’s a routing policy that controls automation vs review vs escalation.

Structured output

Confidence / reliability score, decision status, and escalation flags are required. Without this, agents can’t be reused safely.

Human-in-the-loop

Humans don’t “review everything”. They review when policy requires it or when trust is low.

Replayable evidence

A trace ID ties decisions to inputs, tool calls, validations, and approvals—so audit questions are answerable later.

Want to see this on your workflow?

Bring one queue (intake/triage/docs). We’ll map gates, confidence bands, and what “done” looks like.

Book a call