The Token Budget Problem

AI compute costs, workslop, subsidized tokens, and the false simplicity of replacing payroll with compute

This post follows my earlier essay on the AI substitution trap. That argument, drawn from a work-in-progress article, is that organizations may adopt AI to reduce dependence on labor and end up dependent on vendors, models, cloud systems, token pricing, data pipelines, and technical infrastructures they do not control. The risk is not simply that AI replaces workers. The organizational risk is that firms eliminate the human capacity they need to evaluate, repair, or exit the systems that replace workers.

A related problem is becoming more visible. AI substitution is often discussed as if it were a straightforward comparison between human payroll and machine output. Workers are expensive. AI is fast. If AI produces comparable work at lower cost, replacing labor appears financially rational. This is the basic substitution story. It treats AI as a cheaper input that can be swapped in for a more expensive one.

That framing is too simple. Organizations do not simply remove labor costs. They convert one cost structure into another. Payroll becomes compute. Salaries become subscriptions, model calls, cloud bills, API charges, context windows, inference costs, vendor contracts, integration costs, compliance systems, and validation labor. A firm may reduce headcount and still increase the total cost of producing reliable work.

Screenshot of Harvard Business Review article titled AI-Generated Workslop Is Destroying Productivity — Figure 1. The Harvard Business Review article on “workslop” captures one part of the cost-conversion problem: AI can produce polished artifacts that shift verification and repair work onto others rather than reducing total work.

The payroll-to-compute trade depends on assumptions that are rarely made explicit. AI output has to be usable. Errors have to remain limited. Human validation has to stay manageable. Token prices have to remain low. Usage cannot expand faster than efficiency improves. Vendor terms cannot change in ways that destabilize the business case. The organization also has to retain enough internal competence to judge whether the system is working after workers have been removed. If those assumptions fail, the apparent cost advantage becomes much less clear.

The token story is especially misleading because the cost of individual tokens has been falling. In one sense, that is real. Models have become cheaper to run, lower-cost providers have entered the market, and inference prices have declined sharply for many uses. But the price of a token is only one part of the organizational cost of AI. The total bill depends on how many tokens are used, how many workflows come to depend on them, how much output has to be checked, and how deeply AI becomes embedded in organizational routines.

A token can become cheaper while the total AI bill goes up. A short prompt becomes a long prompt. A single answer becomes a multi-step workflow. A writing aid becomes a research assistant, coding partner, meeting summarizer, compliance checker, customer-service agent, sales tool, document reviewer, and workflow coordinator. The unit price falls, but the organization builds more of its activity around token consumption.

Chart showing cost per token falling while tokens consumed per query rise — Figure 2. The token-cost chart shows the basic tension: cost per token falls, but tokens consumed per query rise sharply. Cheaper units do not guarantee a cheaper system.

The current token price should also be treated as a market-formation price, not simply as the natural price of a mature technology. Token costs are shaped by technical efficiency, but also by subsidization, strategic pricing, cloud partnerships, venture capital, hyperscaler investment, and competition for market share. Model providers have reasons to keep prices low while organizations build habits around AI and reorganize workflows around vendor systems. Later, once dependence has been established and firms need revenue rather than adoption, the price structure can change.

This is where the AI cost story begins to resemble the gig economy. Uber and Lyft did not initially expand only because they had discovered a permanently cheaper way to move people through cities. They subsidized both sides of the market. Riders received artificially cheap rides. Drivers received incentives. Venture capital funded price competition, consumer habituation, driver recruitment, and market expansion. The goal was to make ride-hailing feel normal, convenient, and inevitable while weakening traditional taxi systems and competing against one another for market share.

Those prices were not the mature economics of the service. They were subsidized prices during a period of market formation. Uber had deeper access to capital than Lyft and could sustain losses, expand aggressively, and compete through price. But the sector could not burn cash indefinitely. Once ride-hailing firms faced stronger pressure for profitability, prices rose, fees increased, driver incentives changed, and the consumer experience shifted. The cheap ride trained people to treat a subsidized price as the normal price.

AI may be entering a similar phase. Cheap tokens should not be interpreted too quickly as proof that AI labor substitution is durably cheaper than human labor. Low token prices may show that model providers, cloud firms, and investors are absorbing costs while the market is being built. If an organization replaces workers because current token costs look cheap, it may be making a long-term dependency decision based on a temporary pricing regime.

The analogy should not be pushed too far. AI is not ride-hailing, and model markets are not urban transportation markets. Inference costs can fall because of hardware improvements, model optimization, open-source competition, smaller specialized models, and more efficient architectures. The gig economy analogy clarifies the timing. Early low prices can conceal mature costs. A subsidized input can become the basis for organizational dependence before its long-term economics are clear.

The danger grows when the organization removes the workers who know how to perform, evaluate, or reconstruct the work. If token prices rise, vendor terms change, model performance degrades, or AI budgets expand beyond expectations, the firm may discover that payroll was not only a cost. It was also retained competence.

This is the part that gets lost when AI budgets are compared directly to salaries. Payroll is visible. Salaries and benefits appear as recurring obligations. Compute looks more flexible, more scalable, and more technical. But compute is not labor. It produces outputs. Those outputs still have to be evaluated, integrated, explained, defended, and repaired. The marginal cost of generating a response is not the same as the system cost of using that response inside an organization.

That brings in workslop. The term refers to AI-generated work that looks polished but lacks the substance required to be useful. It might be a memo that sounds professional but says little, a summary that misses the key issue, a report with plausible but unsupported claims, a code fragment that works only superficially, or a slide deck that creates the appearance of analysis without doing the work. Workslop does not eliminate labor. It moves labor downstream.

One person saves time by generating an artifact quickly. Someone else spends time figuring out whether the artifact is accurate, relevant, complete, legally usable, technically functional, or worth salvaging. The cost has not disappeared. It has been moved into review, correction, coordination, and repair. In some cases, AI increases individual output while reducing organizational productivity.

This is one of the central errors in the substitution narrative. It treats production as the expensive part and verification as secondary. In many kinds of knowledge work, verification is often the work. Producing words, code, summaries, plans, emails, rankings, recommendations, or synthetic analyses is easier than knowing whether they are correct, useful, appropriate, defensible, and adapted to context. If AI increases the volume of plausible output faster than organizations increase their capacity to evaluate it, then AI has not solved the labor problem. It has created a validation problem.

The validation problem becomes more serious when substitution eliminates the people best positioned to validate. Existing workers often know the customers, routines, exceptions, histories, informal standards, edge cases, and failure modes. They know what counts as an acceptable answer in practice. They know when a document is merely polished and when it is actually useful. If those workers are removed, the organization may still generate more output, but it may lose the competence needed to distinguish output from work.

AI productivity gains should therefore be interpreted carefully. A firm may initially see gains because experienced workers use AI to complete tasks faster. Managers may then interpret those gains as evidence that fewer workers are needed. But the gains may have depended on the workers’ expertise. The human side of the human-AI system made the AI useful. If that human capacity is later cut, the original productivity gain may not survive.

Substitution assumption	Likely failure point	Better organizational question
AI output is cheaper than human labor.	Unit token prices fall while total usage rises.	What is the total system cost after usage growth, validation, integration, and vendor dependence?
Current token prices reflect stable economics.	Prices may be subsidized by investors, cloud providers, or strategic competition for market share.	What happens when providers shift from adoption to profitability?
AI output replaces work.	Work moves into review, correction, coordination, and repair.	How much labor is required to make AI output reliable?
AI reduces dependence on workers.	Dependence shifts to vendors, models, cloud systems, and data infrastructure.	Does the organization retain credible exit capacity?
AI productivity gains justify headcount cuts.	Gains may depend on the workers targeted for removal.	Would performance survive if internal competence were reduced?
AI adoption is necessary to stay competitive.	Organizations may over-adopt because non-adoption looks risky before efficiency is proven.	Is adoption driven by demonstrated value or fear of being left behind?
Low-cost models solve the budget problem.	Cheap output still requires governance, validation, and task matching.	Which model fits which task, and where is human judgment still required?

The fear of being left behind is a major part of the current adoption cycle. Organizations may be adopting AI not only because they have proven durable efficiency gains, but because they fear the technology will work and they will be punished for waiting. This is a familiar organizational pattern. When a new technology becomes associated with competitiveness, modernization, and managerial competence, non-adoption starts to look risky. Executives do not want to be the ones who failed to adopt the next major technology. Firms imitate one another under uncertainty because being wrong with everyone else feels safer than being wrong alone.

This is the risk of ecological over-substitution. Adoption spreads before the efficiency case is settled because adoption itself becomes a signal. Investors expect it. Consultants recommend it. Competitors announce it. Managers use it to demonstrate decisiveness. Firms then reorganize around AI even when the evidence for substitution is incomplete. If the market matures and AI costs rise, many organizations may have to reconsider how much substitution made sense.

There is also a cheaper-model complication. The U.S. AI strategy has often been organized around expensive frontier models, large infrastructure buildouts, and the assumption that the most powerful systems will justify high costs. But the model market is not uniform. Chinese models, especially DeepSeek, have put pressure on the assumption that higher cost automatically means a better strategic path. Cheaper models may not be as powerful across all dimensions, but they may be good enough for many bounded, low-risk, easily verified tasks.

Output price comparison across nine mainstream AI models — Figure 3. Output price comparison across mainstream models. Lower-cost models complicate the assumption that expensive frontier substitution is the only available AI strategy.

Lower-cost AI could be used to augment human workers rather than replace them. Organizations do not need to use the most expensive model for every task. They can match model cost to task risk. A cheaper model might be appropriate for routine summarization, classification, drafting, or internal search. A more expensive model might be reserved for harder or higher-risk tasks. Human expertise remains the organizing center. AI becomes a tool inside a human system of judgment rather than the basis for eliminating that system.

This third path is more plausible than both extremes. Refusing AI altogether is not a serious organizational strategy in many domains. Replacing workers wholesale because tokens are currently cheap is also not serious. The more durable strategy is to use AI where it produces real gains, preserve internal expertise, maintain fallback capacity, protect junior training pipelines, and avoid building core operations around a pricing structure that may not last.

For workers, the lesson is similar. Using AI well is increasingly important. Independent competence remains more important, not less. If many people can generate text, summaries, slides, code snippets, and synthetic analysis, the scarcer skill becomes judgment. Can the output be evaluated? Can it be repaired? Can its assumptions be identified? Can the user tell when the system misunderstood the problem? Can the person using AI still perform or explain the underlying work?

A worker who understands the task can use AI as leverage. A worker who only knows how to prompt becomes dependent on systems they cannot evaluate. Treating AI as the whole future of employability can become self-defeating if it comes at the expense of domain knowledge, writing ability, technical competence, organizational judgment, or practical experience.

For organizations, the same point applies at a larger scale. AI strategy should not be measured by adoption rates, token volume, or headcount reduction alone. A firm can use more AI and become less capable. It can cut payroll and increase total system cost. It can generate more output and reduce productivity. It can appear technologically advanced while becoming dependent on vendors it cannot leave. It can make work look faster while shifting the labor of judgment elsewhere.

The better questions are less glamorous. Has AI reduced total work or increased downstream cleanup? Has it preserved or eroded internal expertise? Are junior workers still learning the tasks that produce future judgment? Can the organization compare vendors intelligently? Can it switch systems without operational collapse? Are token costs predictable? Is model choice matched to task risk? Are humans empowered to intervene, or are they only nominally in the loop?

This returns to the AI substitution trap. The danger is not that organizations use AI. The danger is that they mistake short-term cost conversion for long-term efficiency. They move money from payroll to compute, treat the shift as savings, and later discover validation labor, coordination labor, compliance costs, vendor management, technical repair, rising usage, and unstable pricing. The worst version arrives when that discovery comes after the organization has cut the human capacity needed to recover.

The gig-economy analogy clarifies the risk. Uber and Lyft trained customers to experience subsidized prices as normal prices. AI firms may be training organizations to experience subsidized token costs as normal compute costs. In both cases, early prices can conceal mature economics. Once providers need profitability rather than adoption, the price of dependence rises.

The relevant comparison is not AI versus workers in the abstract. It is governable AI-augmented work versus non-governable substitution. AI-augmented work can improve productivity when human competence remains intact. Non-governable substitution can produce workslop, budget overruns, vendor dependence, coordination burdens, and capability loss.

AI may become cheaper per token. Organizations do not operate on tokens alone. They require reliable work, accountable judgment, and the capacity to change course when a system fails. When firms replace payroll with tokens, they may discover that compute is not a substitute for competence. When they build operations around vendor-controlled systems, the largest cost may not be the monthly bill. It may be the loss of the capacity to leave.

The Token Budget Problem

Sources