Most companies have a harness problem
Most companies do not have an AI model problem. They have an AI harness problem. They give employees access to a powerful chat interface, usage grows, experimentation spreads, and then the bill arrives. At that point, the company realizes that access is not the same thing as architecture.
Signing up for ChatGPT, Claude, Cursor, Gemini, or any other lab-provided surface can be an excellent way to start. It is fast, useful, and gets people moving. But if every user and every workflow hits premium inference by default, the company has not built an AI strategy. It has created a spending pattern.
The missing layer is an internal harness: a company-controlled chat and workflow interface that routes tasks based on job type, risk, benchmark performance, and cost per accepted task.
End users should not manage model economics
Most employees are not going to know which model should handle which task. They are not going to think about input-token cost, output-token cost, tokenizer differences, context length, tool-call overhead, prompt caching, reasoning effort, hidden system prompts, retry rates, schema adherence, model regressions, latency ceilings, review burden, or escalation thresholds.
And they should not have to. An employee should be able to ask for help. The system should decide whether the task belongs with a small model, a medium model, an open-weight model, a frontier model, a deterministic tool, or a human reviewer.
That is the core argument for the internal AI harness. The user chooses the job. The system chooses the model.
Usage scales faster than governance
AI usage can become material very quickly once employees get access to powerful tools, especially when agentic workflows and coding assistants enter the picture. The bill can change even if the pricing page does not.
It can change because the default model changes. It can change because users delegate more work. It can change because the model writes longer. It can change because the tokenizer changes. It can change because tools add context. It can change because agent loops become common. It can change because the harness changes.
Governance has to move upstream. The organization needs to understand task economics before premium inference becomes the default path for every request.
Cost per accepted task is the real metric
A company does not buy tokens. It buys completed work. Cost per accepted task is the right economic unit: model cost plus tool cost plus infrastructure cost plus retry cost plus review cost plus failure cost, divided by accepted outputs.
A model can be cheap per token but expensive per task if it needs more retries, produces verbose outputs, fails formatting, misses intent, or requires human correction. A model can be expensive per token but justified if it solves a high-value task correctly on the first pass and reduces review burden.
Tool use adds another layer. Tool definitions, retrieval results, prior context, retries, and verbose outputs can all turn a simple-looking agent workflow into a much larger economic event. A mature harness measures the whole task, not only the model call.
Classify the job before choosing the model
The first job of an internal AI harness is task classification. A prompt like help with this is ambiguous. The harness needs to infer the actual job: drafting, rewriting, summarizing, extracting, classifying, calculating, coding, researching, analyzing, planning, tool execution, or high-risk decision support.
Then it should classify risk: low-risk internal productivity, sensitive internal data, customer-facing output, financial analysis, legal or compliance-adjacent work, regulated data, external publication, code execution, production-system access, or high-impact decision support.
Only then should it route. A routine email rewrite can go to a small model. Invoice extraction can go to a small model plus schema validation. A financial calculation check needs a benchmark-cleared model plus deterministic validation. A board recommendation needs frontier capability and human review. Agentic research needs tool and token budgets.
The harness is the economic control plane
A good internal AI harness needs task classification, risk classification, model routing, token budgeting, validation, escalation, observability, continuous evals, and governance. These are not engineering nice-to-haves. They are the mechanisms that keep enterprise AI economically sane.
Without a harness, every user interaction becomes a direct path to vendor defaults. With a harness, the organization can decide which model handles which job, which tasks can use frontier models, which tasks require review, which tools are available, how much context is allowed, when a workflow should stop, and which outputs count as accepted.
The winning companies will not be the ones that blindly standardize on the newest model. They will be the ones that understand the work, measure the outcomes, and route each task to the right model at the right cost. Access is not architecture.