Platform mental models

Operational AI, engineered

This isn’t a chatbot. It’s an operational runtime: stateful execution, tool permissions, confidence-based routing, and replayable audit trails.

IntakeStatePlan executionPolicy gatesHuman checkpointTool allow-listAudit logReplay / debuggingA runtime for regulated workflows: state + gates + permissions + replay.

Core

Core agent runtime

The primitives that make agents safe, predictable, and shippable in regulated operations.

State management

Agents aren’t stateless chats. We treat state as a first-class artifact so workflows can be paused, resumed, audited, and recovered.

Human-in-the-loop checkpoints

Review is a policy gate. The runtime can stop at defined checkpoints, request approval, and capture reviewer actions as evidence.

Orchestration

A workflow is a directed plan: branching paths, decision points, and explicit next actions—so outcomes land in queues, not in conversations.

Streaming

Stream partial outputs and intermediate state to UIs and downstream systems without waiting for a single “final answer”.

ReAct / plan execution

Plans are executed as steps (think: call a tool, validate, decide, route), not as a one-shot prompt.

Tool allow-lists + permissions

Agents only get the tools they’re allowed to use, with scoped permissions. This is how you control blast radius.

Branching / retries / timeouts

Reliability is built into the runtime: retry strategies, deterministic fallbacks, and timeouts that prevent runaway loops.

Audit logs / replay

Every step is replayable: inputs, outputs, tool calls, and approvals. Debugging becomes a timeline—not archaeology.

Capabilities

Toolpacks (MCP-style)

Composable tool collections: keep the agent small, give it reliable tools, and govern what it can do.

Document intelligence pack

Turn unstructured docs into structured artifacts.

extract_textclassify_documentsummarize_documentdetect_missing_fieldsextract_entitiescompare_documentshighlights_differencesconfidence_score

Workflow control pack

Create and route work in real queues.

create_taskassign_taskmark_task_completewait_for_human_approvalescalate_caseadd_noteadd_comment

Decision + eval pack

Score outputs, apply thresholds, and pick next actions.

evaluate_against_rubricscore_responseapply_thresholdsgenerate_explanationconfidence_bandrecommend_next_action

Comms pack

Prepare human-approved communications.

draft_emaildraft_lettergenerate_notificationprepare_summary_for_humanprepare_customer_facing_msg

Governance + compliance pack

Evidence, policy checks, and sensitive-data handling.

log_decisionrecord_justificationgenerate_audit_summarypolicy_checkredact_sensitive_data

Extensibility

Domain packs

Semi-generic (not fully bespoke): reuse internal tools, add domain vocabulary, rules, and templates.

Semi-generic, not bespoke

Domain packs are reusable building blocks, not one-off scripts. If you build the same thing twice, it belongs in a core or domain pack.

Reusable primitives + domain language

We reuse the same core runtime and tool primitives, then layer domain-specific vocab, rules, and templates to reduce time to deployment.

Examples (public-safe)

Healthcare operations • Pharma/PBM workflows • Legal operations • Education workflows

Client-specific layer

Bespoke integrations and customer-specific tools belong in the client layer. Rule of thumb: services build on the platform, never around it.

Mental models

How we keep agents safe, predictable, composable

We don’t trust vibes. We ship systems with contracts.

The simple model

  • Agents make decisions
  • Confidence decides trust
  • Orchestrator decides flow
  • Humans step in when trust is low

ReAct orchestration (example)

AgentPlan (ReAct Orchestrator)
  invoke AtomicAgent: DocumentTypeDecision
  if BILL:
    invoke AtomicAgent: MedicalBillExtraction
    invoke AtomicAgent: PaymentEligibilityDecision
  if REPORT:
    invoke AtomicAgent: ClinicalSummaryAgent
Confidence → routing policy< 0.60 → mandatory human review0.60–0.85 → soft review / spot check≥ 0.85 → auto-proceedKey idea: confidence is per-step / per-field, not one global number.The orchestrator chooses: proceed • spot-check • review • retry/escalate.

Trust

Confidence model: structured output + routing

Agents must emit structured output. The orchestrator routes based on confidence bands and status, with human gates where needed.

Agent response contract

{
  "status": "SUCCESS | NEEDS_HUMAN | FAILED",
  "confidence": 0.87,
  "confidence_breakdown": {
    "self_consistency": 0.9,
    "tool_grounding": 0.8,
    "validation": 1.0,
    "coverage": 0.7
  },
  "requires_human": false,
  "outputs": { ... },
  "artifacts": { ... },
  "trace_id": "..."
}

The contract makes outputs composable: downstream systems don’t need to parse prose to understand state, risk, or next actions.

Default orchestrator behavior

Confidence range
Default behavior
≥ 0.85
Auto-proceed
0.60 – 0.85
Soft review / spot check
< 0.60
Mandatory human review
FAILED
Escalate or retry

Confidence is not an abstract score. It’s a routing policy that controls automation vs review vs escalation.

Structured output

Confidence / reliability score, decision status, and escalation flags are required. Without this, agents can’t be reused safely.

Human-in-the-loop

Humans don’t “review everything”. They review when policy requires it or when trust is low.

Replayable evidence

A trace ID ties decisions to inputs, tool calls, validations, and approvals—so audit questions are answerable later.

Want to see this on your workflow?

Bring one queue (intake/triage/docs). We’ll map gates, confidence bands, and what “done” looks like.