What is tokenomics and why it might be your biggest monthly bill

Walk through a colliery town in Yorkshire or South Wales in the 1850s and coal is everywhere, stacked at the pit-head, loaded onto barges, feeding the steam engines that had turned Britain into the workshop of the world. For most of the century, coal was simply what powered things, mined at home, in such steady supply that nobody had reason to look for an alternative.

Then, in 1859, a man named Edwin Drake drilled the first commercial oil well in Titusville, Pennsylvania. He wasn’t looking for a fuel to replace coal, he was after kerosene, to light lamps, because whale oil, the main lamp fuel at the time, was becoming scarce and expensive as sperm whales were hunted in large numbers.

The crude that came up out of that well also contained a thinner, more volatile fraction called gasoline, and early refiners had so little use for it that they often burned it off or dumped it straight into the nearest river. It was a waste.

Then came the combustion engine, and that waste product turned into the most valuable thing to come out of a barrel of crude. By the time Henry Ford’s Model T rolled off the line in 1908, gasoline wasn’t a byproduct anymore, it was the thing cars, factories, and eventually entire militaries couldn’t run without, and oil became the commodity that powered the global economy.

In October 1973, when the Arab members of OPEC cut off oil exports to countries supporting Israel in the Yom Kippur War, the price of crude nearly quadrupled within months, and economies on the other side of the planet slid into recession over a decision made in a handful of desert capitals.

Half a century later, the calculus hasn’t changed much. The Strait of Hormuz, a stretch of water you could cross by boat in twenty minutes, carries nearly a fifth of the world’s oil supply, and every time a tanker gets delayed there, markets move within hours.

What I find interesting about that story is how gasoline went from something nobody needed to something nobody could do without, once cars made it essential. I think about that whenever I read about AI today. If one of the major AI labs shut its servers down for a week today, most of us would notice an inconvenience, a chatbot goes silent, a coding assistant stops responding.

And that’s not as hypothetical as it sounds. Just this week, the US government ordered Anthropic to restrict access to its most powerful model, Mythos, and the public version built on it, Fable, for non-US users, citing national security concerns. Five or ten years from now, with far more of the economy running on models like this, would a disruption like that still feel like a brief inconvenience, or more like October 1973?

The answer depends on how deeply computing has woven itself into the economy by then. And compute, broken down to its smallest tradable unit, is measured in tokens. If oil runs on barrels and electricity runs on kilowatt-hours, AI runs on tokens, and understanding how that economy works is becoming more relevant than ever before.

This week, that’s the world I want to walk through with you.

In this edition:

What a token actually is, and why every AI exchange has an invisible bill attached
How tokenomics works, and the brief, strange culture of tokenmaxxing that took over corporate AI budgets
Why falling token prices haven’t stopped AI bills from climbing
Can you estimate if your AI is actually paying off?
A few habits worth borrowing if you want your own AI spend to behave sensibly

Let’s begin.

What is a token, really?

AI models don’t read the way we do. We see whole words, sometimes whole sentences, in one glance. A language model breaks everything into smaller fragments first, somewhere between a syllable and a short word. “Understanding” might split into “under” and “standing.” A number like 2026 might break into two or three pieces. Punctuation gets counted too.

Think of it the way a tailor thinks about fabric. You don’t buy a shirt, you buy metres of cloth, and the shirt gets made from that. A token is the AI version of that metre of cloth. Every prompt you type gets cut into these small pieces before the model can read it, and every word it writes back gets assembled the same way before it reaches your screen.

There are two kinds you’re paying for. Input tokens are everything you feed the model: your question, your documents, the whole back and forth of a long conversation if it gets resentful each time. Output tokens are everything it generates in return. Both get counted, and both get billed.

That’s what makes AI economics different from any software you’ve bought before. A Netflix subscription charges the same whether you watch one show a month or fifty. A token meter does not. The bill depends on how much AI you use, not how many licences you’ve bought, and that changes everything.

Tokenomics, and the rise of Tokenmaxxing

Once tokens became the unit of account, AI providers started pricing them the way utilities price electricity. A token from a basic model costs a fraction of a cent. A token from an advanced reasoning model, the kind that works through several steps before it answers, costs meaningfully more, because it’s doing meaningfully more work per token. It’s also extraordinarily lucrative for the people selling it. One of the leading labs is reported to have seen its revenue jump from around $9 billion to somewhere near $35-40 billion within a single year, almost entirely on the back of this kind of usage-based pricing.

Underneath that pricing sit four pillars of tokenomics. The first is value. A token’s value depends on intelligence (model quality and context size) and speed (tokens generated per second).

The second is demand, how much of it a business will actually need, which starts with Users × Sessions × Tokens per Session, but actual usage is much higher due to hidden reasoning tokens, agentic workflows, and context management.

The third is supply. There are three ways companies buy AI: paying for tokens as they’re consumed through APIs, committing to a certain level of usage in exchange for lower costs, and running models on your own hardware when usage becomes large enough to justify it.

And, the fourth is monetisation. In practice, companies make money from AI in four ways: selling AI access through APIs, building AI-powered products, adding AI to existing products, or using AI internally to improve productivity.

For a while, almost everyone obsessed over the second pillar, demand, and ignored the first one, value, entirely. A strange culture grew up around all this, as coding agents and AI assistants spread inside large companies, several of them started treating token consumption itself as a metric of progress.

Amazon’s token leaderboards ranked employees by how many tokens their agents had burned through, and the heaviest spenders got celebrated as innovators, regardless of what they’d actually built. The logic was that more tokens meant more value. People started calling it Tokenmaxxing, half as a joke, before realising it wasn’t one.

The scale this reached is hard to overstate. Google alone now processes something like 1.3 quadrillion tokens a month, a number that’s jumped 130-fold in just the last year. In some companies, generative AI had turned into the fastest-growing line on the technology budget, swallowing up to half of total IT spend, with cloud bills, mostly riding on AI workloads, climbing close to 20% a year. Most of that AI bill comes from a handful of teams, especially engineering, accounting for more than 60% of total AI spending, with per-person costs often more than 10x higher than those in sales.

However, some companies are now waking up to this. Uber’s leadership admitted publicly that the company burned through its entire year’s token budget for one major AI coding tool well before the year was half over, and that tying AI usage to anything actually shipped had become hard to prove. Amazon took its internal AI leaderboard down, Microsoft pulled back on Claude Code subscriptions, once the token bills came in far higher than expected. Even Sam Altman acknowledged that many of his own customers were spending heavily on AI while unsure how much of it was waste.

Every technology cycle picks an early vanity metric and mistakes it for value, whether that was miles of railroad track in the 1800s or daily active users during the dot-com years. This cycle picked tokens. The correction tends to arrive once someone finally asks the only question that matters: what did all that spend actually buy you?

Why cheaper tokens still mean bigger bills

Token prices have been falling fast. Deloitte’s research projects the average cost of inference dropping from around four cents per million tokens in 2025 to roughly one cent by 2030. By most logic, that should shrink AI’s place on the budget.

It hasn’t worked that way, and the reason has a name nearly two centuries old: Jevons paradox. When something gets cheaper, people don’t spend the same amount for less of it, they find far more ways to use it, and total spending climbs even as the price per unit falls. Coal got cheaper and more efficient in nineteenth-century England, and total coal consumption rose rather than fell, because cheap power unlocked uses nobody had bothered with before.

AI is following the same script. A simple chatbot query in 2024 might have used around 2,000 tokens for a couple of cents. The agentic workflows running through enterprise systems today look nothing like that single exchange. One task can now pass between a primary agent, several sub-agents, tool calls, retrieval steps, validation checks, and retries, easily consuming 500,000 tokens to finish work that once took a single round trip. Here’s what that escalation looks like side by side, and why a 75% drop in price per token still adds up to a far heavier bill.

The same thing is playing out across the entire market, not just inside individual tasks. Weekly token consumption across the most used AI models has climbed from under a trillion tokens a week at the start of last year to well over twelve trillion by this May, and a growing share of that is shifting toward cheaper Chinese open-source models, as companies start routing simpler work away from the priciest frontier options.

As a result, after a massive boom in spring 2026, the market has hit a sudden cooling-off period, and money spent on LLM tokens is actively shrinking for the first time in months.

You see, there’s a simple economic logic underneath all that routing: when something becomes scarce, higher prices push people toward cheaper alternatives and reserve the limited supply for the uses that justify it most.

That’s what is happening with AI compute today. Companies that can afford frontier models still use them for high-value tasks where the extra capability matters. Everyone else is shifting to cheaper models for routine work. Because when you compare leading AI models on intelligence, speed, and cost, an interesting pattern appears. Claude Opus 4.8 and GPT-5.5 are among the smartest models, but they cost more than $4 per million tokens. DeepSeek’s latest model scores almost as well, within about 10 points, but costs just $0.18 per million tokens, roughly 20 times cheaper.

Sending every query to the most expensive model is like hiring a surgeon to apply a band-aid. Beyond a certain point, paying more doesn’t always buy much more intelligence. Some models are extremely fast but less capable. Others are expensive despite only average performance. At the top end, intelligence and price generally rise together. After that, the relationship starts to break down.

Return on tokens: Is your AI spend actually paying off?

As AI spending grows, companies are asking a simple question: What return are we getting on all these tokens? The answer is straightforward:

Return on Tokens = (Value created − Token cost) ÷ Token cost

The gap between successful and unsuccessful AI deployments often comes down to one thing: how tokens are used.

According to Deloitte, companies are increasingly falling into two groups: value creators and value eroders. Some turn AI spending into measurable business outcomes. Others accumulate costs without creating enough value to justify them.

AI and generative AI already account for about 36% of digital transformation budgets, and that share continues to grow. What’s getting squeezed to make room for it is the less glamorous but essential spending on cybersecurity and data infrastructure.

Even business leaders remain cautious about the returns. Nearly 60% expect it will take up to three years to see meaningful value from basic AI automation, while six in ten think more advanced AI systems will take even longer. Only about one in four finance leaders say their AI investments are delivering clear, measurable value today.

Many companies use AI to solve the same problem from scratch every time. That works for prototypes and first drafts, but it becomes expensive and unreliable for business-critical tasks like fraud detection, compliance checks, or claims processing. The AI keeps consuming tokens, but very little of that work is reusable.

The better approach is to let AI do the thinking once and then turn that knowledge into a repeatable workflow. Instead of re-solving the same problem every time, the system follows established rules and only calls AI when something changes.

Companies using this approach have reported accuracy above 99% in claims processing while reducing token usage by as much as 100x. There is also a simpler way to improve returns: use the right model and infrastructure for the job. Many organizations default to the largest models even when smaller, specialized models can deliver similar results at a fraction of the cost.

Deloitte’s research highlights another important factor: infrastructure utilization. Companies running AI systems at around 85% utilization generate significantly more value per token than those that overprovision capacity and leave resources idle.

The lesson is simple: the highest returns come not from using more AI, but from using the right AI, in the right way, at the right scale.

Becoming a savvy token economist

Most leaders running AI programmes right now are still measuring success the way Tokenmaxxing taught them to, how many tokens got used, how many agents got deployed. Almost none of them are measuring what actually came out the other end.

Fixing that doesn’t require a research lab or a six-month transformation programme. It mostly comes down to a handful of habits the companies converting spend into real return have already picked up.

1. Use the right model for the job

Not every task needs the most powerful model. Simple requests can often be handled with rules or smaller, cheaper models. Reserve expensive frontier models for tasks that genuinely require advanced reasoning.

2. Reduce unnecessary tokens

A lot of AI spending comes from sending more context than necessary and generating longer responses than needed. Summarizing older conversations, limiting context windows, and keeping outputs concise can significantly reduce token usage without hurting performance.

3. Manage AI spend like any other expense

The same rule that applies to any subscription or recurring cost applies here too: know what you’re paying, know what it’s actually giving you back, and check in on it once in a while instead of letting it run in the background. The moment a cost stops getting questioned is the moment it stops being managed.

Disclaimer – The information provided herein is intended solely for educational purposes. In this material, Dezerv has utilized information through publicly available sources, and other data deemed to be reliable. All trademarks, logos, and brand names mentioned are used for identification purposes only and do not imply endorsement or recommendation.

Why living longer should matter to you

What is tokenomics and why it might be your biggest monthly bill

How to choose the right wealth manager