Cinematic monochrome expedition landscape with technical AI interface overlays

AI product systemsHenderson, NV / applied AI systems

AI Product & Systems

Stephen Driggs

A working archive of production AI products, agentic platforms, and operating systems.

Product notes, systems work, and practical AI research across agentic platforms, semantic layers, memory, and enterprise adoption.

Read the work Notes / contact

Data

semantic contracts

Agents

orchestration

Memory

context graph

Trust

evaluation loop

Latest writing

Essays on applied AI systems and the operating models around them.

Blog index

Architecture

What I Deleted to Turn One Product Into an Orchestrator

A restaurant-analytics AI product with agents hardwired into its websocket runtime became a manifest-driven orchestration platform in one working day. Here is the deletion ledger, the compile gates that fail closed, and what happened to delivery speed when adding an agent dropped to 276 lines.

agent platformsorchestrationarchitecturegovernance

July 2, 202612 min

Read note

Teaching notes

Context Is the Curriculum: Notes From Teaching AI 101 to Business Teams

I built and teach an AI 101 class for growing businesses. The deck runs twenty-five slides, one metaphor about a ball on a net, and a four-word definition of AI. Here is the curriculum, and the production scars behind each slide.

context engineeringai literacyagentic systemsgovernanceenterprise adoption

July 202611 min

Read note

Operating model

Inside the Dark Factory: What Autonomous Software Delivery Actually Looks Like After 124 Iterations

My autonomous delivery loop ran for 49 hours, completed 124 iterations, filed 313 findings, shipped 40 fixes with regression specs, and then scored itself 78.85 against a ship bar of 80. It refused to call itself done. That refusal is the whole point, and it is what both the factory evangelists and the blanket skeptics keep missing.

dark factoryautonomous deliveryverificationagent harnessgovernance

June 29, 202612 min

Read note

Unit economics

What an Agent Actually Costs

One day of my autonomous development fleet bills out around 1,450 dollars at list, and 92 percent of it is the model reading, not writing. Here is the full cost anatomy, the budget constitution the platform enforces, and an honest per-feature estimate.

unit economicscost controlagentsbudgets

June 26, 202612 min

Read note

Trust and verification

Trust Is a Provenance Field

Our analytics agent told three test personas there were 5,121 guests. The database held 1,320. The demo looked great. This is the architecture that makes an answer carry its own receipts, and the QA numbers that forced it.

provenancetrustevalssemantic layer

June 24, 202612 min

Read note

Production notes

What Memory Actually Changed

Agent memory is the most oversold feature of 2026. I shipped a cross-agent memory system with share grants, an access log, and a revoke test that has to pass mid-session. Here is what it actually consists of, and the honest list of what it did and did not change.

agent memoryarchitecturegovernance

June 22, 202612 min

Read note

Architecture

Artifacts, Not Chat

We replaced a 24-type SSE protocol and a 30-plus-type WebSocket protocol with exactly six events, and made one of them, the artifact, the unit of work. Chat is an input device. If the output cannot outlive the conversation, you built a demo.

artifactsstreamingagentsproduct design

June 20, 202611 min

Read note

Architecture

Headless First: The Chat Window Is One Client, Not the Product

My platform's agent launch path takes ws=None. A chat UI, a voice host, a CLI test runner, a 60-second scheduler, and other agents all call the same compiled agent through the same gates. Headless-first is the strategy; the chat window is just one client.

agent platformsarchitectureautomationgovernance

June 18, 202611 min

Read note

Research paper

Don't Trust the Claim, Verify the Artifact: Verification-Gated Orchestration for Coding Agents

Autonomous coding agents keep reporting that work is done. When an orchestrator trusts those claims without checking, a false "done" becomes an unsafe merge. Verification-Gated Orchestration takes acceptance authority away from the worker and gives it to an orchestrator-owned grader, so a "done" claim is independently checkable. A 200,000-task substrate simulation validates the no-unverified-merge invariant — and shows that repair, not verification, is what moves resolution.

verificationorchestrationcoding agentsevalstrustworthiness

June 15, 202618 min

Read note

Field notes

Fable 5 in the Orchestrator Seat: Faster, Cleaner, but Needs Cost Guardrails

I swapped Fable 5 into a live autonomous development session, mid-task. Same project, same harness, same session state, different frontier model in the orchestrator seat. The verdict: a generational leap as an orchestrator — and a token furnace unless the harness keeps it on top of a model portfolio.

orchestrationevalsautonomous developmentmodel selection

June 10, 202610 min

Read note

Evaluation

Build Your Own Evals

The most expensive model I tested lost to a free one. I only know that because I stopped trusting the benchmark and ran my own.

evalsmodel selectioncost control

June 2, 20269 min

Read note

Trust and verification

Your AI Will Fabricate Its Own Green Metrics

The first impressive number an autonomous system hands you is the one most likely to be a lie. This is a field report on catching my own dark factory fabricate a completion, a dashboard, and a benchmark score, and what I built so it could not.

autonomous agentsverificationgovernance

June 4, 202610 min

Read note

Local inference

My 0.19-Second Model Took Nine Seconds in Production

I benchmarked 18 text-to-speech models on one Mac. The fastest averaged 0.19 seconds warm, then took roughly nine seconds inside a live voice pipeline. Here are the real numbers, the footguns, a reproducible how-to, and an honest list of where the figures are soft.

local inferenceApple SiliconMLXbenchmarking

June 6, 202611 min

Read note

Model training

What the Data Taught Me When I Tried to Distill My Orchestrator

I spent a week distilling my autonomous development orchestrator into a small local model. The model is a prototype with smoke-scale lifts. The measurements are the part I trust, and they taught me that raw record counts lie, terse targets collapse a model into garbage, and the training loss will tell you everything is fine while the model produces token salad.

model distillationdata qualityfine-tuning

June 8, 202610 min

Read note

Transition pattern

Every Technology Transition Tells You the Same Thing. Here Is Where AI Breaks the Pattern.

Electricity, spreadsheets, and the internet all taught the same lesson: the payoff comes from rewiring the work, not buying the tool. I lived that lesson converting a production analytics product into an agent orchestration platform. Then AI broke the pattern in a way none of the history prepared me for.

technology transitionsai strategyverificationoperating model

May 26, 202610 min

Read note

Cost per task

Small Models Are Catching Up. Your AI Strategy Should Notice.

Do not buy model hype. Benchmark the work your company actually does, then route each job to the cheapest model that reliably clears the bar.

cost per taskmodel routingopen-weight AI

May 202612 min

Read note

Inference control

Own Your Inference, Own Your Data, Stability, and Security

Inference is turning into strategic infrastructure. Which API is cheapest is no longer the only question that matters; which parts of the AI stack your company controls matters just as much.

inferenceAI infrastructuresecurity

May 202611 min

Read note

AI operating model

Everyone Gets Their Own Harness. Governance Still Exists.

The first version of this essay argued for one company harness. I now run the opposite: bring Claude Code, Codex, or your own loop, because the traces, golden tests, and hard-stop budgets live in a substrate no harness can edit.

AI harnessgovernanceobservabilityevals

May 202611 min

Read note

Planning systems

How I Plan With AI

I do not ask AI to decide. I use it to make ambiguity visible before execution starts.

planningagentic workexecution

April 20269 min

Read note

Signal quality

The Echo Chamber Index

I spent a week auditing my LinkedIn feed. Everyone posts about AI, and that part is fine. The unsettling part is how identical the posts have started to sound.

writingAI literacysignal

April 20268 min

Read note

Enterprise adoption

The Three Levels of AI Adoption

Most adoption programs stall because they treat AI literacy as a single skill. It is a progression, from answers to collaboration to orchestration, and the blocker changes at every rung. I have the QA reports to show it.

enterprise AIliteracyoperating model

April 202610 min

Read note

Organization design

Everyone Is a Design Engineer Now

As AI commoditizes knowledge, the scarce capability becomes designing the contracts around the work. I learned that maintaining a registry where agents propose UI and a schema decides what actually renders.

design engineeringAI operating systemleadership

April 20269 min

Read note

Enterprise architecture

The Composable Enterprise

In a market where model capability changes monthly, durable advantage comes from systems designed to be replaced. I used to argue this in the abstract. Then I counted 87 hardcoded model references in my own platform, rebuilt the seams, and proved them by deleting 68,555 lines in one commit.

architectureAI gatewaygovernancemodel routing

April 202610 min

Read note

Interactive demos

Don't read about the work — interrogate it.

A live agent grounded in every essay, project, and resume line on this site. Ask it anything a recruiter would — then ask for a demo built for your company.

Live · no login · free model

Interview my work

A public agent that answers from the 23 essays and the production systems behind them — with citations, artifacts, and his resume on request. Nothing is scripted; it grounds every answer in the archive.

— "What did Stephen delete to build his orchestrator?"
— "What does an agent actually cost?"
— "Show me his resume"

Custom · built for you

Request a custom demo

Tell Stephen about your company and what you want to see. He builds branded, hands-on demo agents himself — your domain, your vocabulary, live artifacts — and reviews every request personally.

Selected work

Selected systems, prototypes, and applied AI programs.

Product and architecture

intelligence.skytab.com

Productmerchant MLFastAPINext.jsLangGraphsemantic layer

A production restaurant intelligence product built from scratch: conversational reporting, dashboards, semantic metrics, typed artifacts, and source-backed insights.

Built as a composable multi-agent platform with penny-exact metric reconciliation and merchant-specific inference, so every benchmark is local, relevant, and able to improve over time.

Product positioning

Deep dive mode

Intelligence cycle

Operator dashboard

Chart artifacts

Merchant models

Quick actions

Enterprise search

Merchant Explorer

Enterprise searchontology discoverysemantic searchFoundry patterns

A find-anything architecture for merchant data using ontology discovery, structured search, aggregation, semantic search, and iterative schema exploration.

Designed to avoid hardcoded field lists and scale across large object schemas without flooding the model context.

Explorer landing

Action approval

Executive summary

Planning trace

No-code agent platform

Enterprise Agent Builder

No-code canvasMCPheadless agentsdata connectorsskill libraries

A visual builder for enterprise agents: connect data, define context, choose reasoning topology, attach tool skills, and publish the result as an accessible headless MCP agent.

Designed so non-engineers can assemble governed agents using the same reusable capability set developed for Merchant Explorer.

Agent canvas

Skill selection

Autonomous development agent factory

The Dark Factory

lifecycle stateagent orchestrationQA gatesself-healing loops

A structured autonomous development lifecycle where agents plan, build, verify, triage, fix, and ship through gated phases.

Evolved into a repeatable operating model for high-throughput AI-assisted engineering.

Monitoring cockpit

Lifecycle detail

Family operating system

Lucky & Clover

family AIvoice and chathome automationorchestrating agentshousehold workflows

A family AI platform where household memory, member context, routines, and home signals combine into one shared operating layer.

Lucky and Clover act as two coordinating agents available through voice or chat, using whole-family context to help with chores, school, shopping, schedules, and connected-home routines.

Family platform

Voice or chat agents

Shared household context

Family workflow

Day in practice

Agent memory architecture

Memory Graph

memory systemsgraph contextrecalibrationpreferencesrelationship maps

A memory model that turns onboarding, recalibration, preferences, facts, relationships, and prior interactions into useful future context.

Built around continuity, permission, and practical recall: memories are learned conversationally, reviewed by the user, and connected through graph structure over time.

Onboarding memory

Memory recalibration

Graph overview

Relationship detail

Voice transcription and multi-voice hub

MLX-Voice

MLXlocal audiovoice hubtranscription

A local voice workspace for transcription, voice capture, and multi-voice workflows built around Apple Silicon and practical operator use.

Extends AI interaction beyond text into fast local voice workflows and reusable voice infrastructure.

Enterprise AI enablement

Training library, skills, and micro-projects

AI curriculumskillsClaude Codeenablement

A large body of trainings, example skills, prompt patterns, and micro-projects used to help teams adopt AI tooling responsibly.

Delivered repeated live training with practical examples for product, risk, compliance, QA, legal, and commercial teams.

Workflow transformation

Enterprise SOP Builder

SOPsprocess miningworkflow designagent instructions

A system for turning messy procedures into structured operating playbooks, agent instructions, checklists, and reusable workflows.

Targets the unglamorous but valuable enterprise layer where AI needs policy, steps, approvals, and durable documentation.

010

Agentic risk operations

Risk Management Periodic Review System

risk reviewfinancial spreadingexposure analysismemo generation

An AI coworker for credit risk analysts that reads financial statements, calculates exposure, drafts review memos, and generates leadership summaries.

Designed around auditable workflows for underwriting and portfolio monitoring rather than generic document chat.

Field notes

Principles in use.

Verification

Trust as an engineering method

Layer deterministic tests, behavioral evals, trace review, and repeated hardening loops. The goal is not a permanent benchmark claim; it is a system that makes failures visible and repairable.

evalstraceabilityquality systems

Product intelligence

Tenant-local models are a product feature

Restaurant benchmarks become more useful when they belong to the merchant: local history, local seasonality, local goals, and inference that learns from the actual operating context.

semantic layersmerchant MLdecision support

Operating model

The agent factory is a management system

Autonomous development works best as a lifecycle with planning, gates, drift checks, review, and triage. The interesting part is the operating cadence, not the novelty of one agent writing code.

agent orchestrationsoftware deliverygovernance

Memory

Useful memory has permission, shape, and decay

Family and enterprise memory both need review, scope, recalibration, and graph structure. Recall becomes a product surface when users can correct what the system thinks it knows.

memory graphpersonalizationhuman control

Archive material

Method over metrics

Evaluation method

Hardening loop

Correspondence

For notes, references, or direct context.

This page is a working archive rather than a formal availability page. Email and LinkedIn are included for readers who want source context or follow-up on a specific system.

Email LinkedIn