
Tokenflation Is a Symptom. The Cure Is Context-Aware AI Architecture
.jpg)
TL;DR: Banks pushed AI adoption hard and it worked, but token costs are now outpacing the economics underneath them, a gap the industry calls tokenflation. Routing, FinOps tracking, and owned compute all manage the symptom. The real fix is architectural: agents that do not pay to relearn the data environment on every run.
For two years, the message inside large enterprises was simple: use AI everywhere, for everything. Then it worked. And working, it turns out, carries a price tag few organizations modeled for.
The numbers coming out of banking this quarter are a useful early warning for anyone running AI at scale. On its most recent earnings call, Royal Bank of Canada disclosed that its AI token usage, the meter that runs with every model call, had climbed 500% year over year since 2025. At JPMorgan, the chief data and analytics officer of the Payments division told Semafor that some employees are now spending more on tokens than they earn in salary. Commonwealth Bank of Australia CEO Matt Comyn put the mechanism plainly at an industry conference, noting that once a workload adds reasoning, tool use, and large context windows, token costs stop scaling on a linear basis.
The industry has started calling this tokenflation: the gap that opens up when AI adoption succeeds faster than the economics underneath it.
Why Agentic Work Breaks the Budget
The reason the bills are unpredictable is structural. In early rollouts, costs stayed modest because tasks were simple: a summary here, a draft there. But as tools become agentic, a single prompt no longer maps to a single answer. It can quietly trigger hours of autonomous work.
Deloitte makes the same point to technology buyers directly: AI spend is structurally volatile and nonlinear by design, and a workload's token footprint is the cumulative result of decisions made across the stack, including model selection, context length, and orchestration. A deployment that looks cheap at pilot volume produces a very different bill once it hits production complexity.
The Fixes Banks Are Reaching For, and Their Ceiling
Faced with a runaway meter, the instinct is to manage behavior, and banks are getting inventive about it, as reported by Evident Insights' Banking Brief:
- CIBC is taking model choice out of users' hands in a new, more agentic version of its internal AI platform. The system classifies each prompt by task type and auto-routes it to whichever model clears the bar most economically, rather than reaching for the most expensive tool when a cheaper one will do the job.
- TD Bank has stood up an AI FinOps function to track usage trends, and is coaching managers to treat tokens like any other line-item expense: using AI because the task calls for it, not just because the option exists.
- PNC is going furthest, building out its own compute so it is not renting every token from an outside lab, with the added benefit that external models never get near its data.
Every one of these moves is rational. But notice what they share: each one bolts a new cost onto the architecture to manage a cost the architecture keeps generating by default. Rationing, routing, monitoring, and re-platforming are all overhead. Someone has to build the router, staff the FinOps team, stand up the GPUs. They treat the symptom.
It does not help that the labs keep changing the math. Per-token pricing has been drifting upward, and the newest top-tier models cost meaningfully more per token than the generation before them. Renting every token, for every task, indefinitely, is not a strategy. It is a liability with a subscription fee attached.
In Data Engineering, the Hidden Cost Is Context
Here is the part the headline numbers leave out: in data and analytics work, much of what an agent burns tokens on is not the task itself, it is re-learning the environment in order to do the task. On every run, the agent re-reads schemas, re-traces lineage, re-checks governance rules, and reloads business context into its window before it can take a single useful action. The data estate gets paid for again and again, as context. Matt Comyn, Commonwealth Bank of Australia's CEO, named context as one of the three things that break linear scaling. In data engineering specifically, context is not an occasional input, it is the largest and most repetitive one.
This sits on top of a problem most enterprises already have. Gartner's Data Engineering 2.0 research found that 74% of data and analytics leaders say their current practices cannot effectively support AI use cases, and only 10% believe they can meet AI project timelines. We covered Genesis's recognition in that same Gartner research here. Throwing token-hungry agents at a data estate no one has mapped does not fix that gap. It just puts a meter on it.
No router or spending cap touches this particular cost. A team can route a task to a cheaper model and still pay the context tax on every single invocation.
What Changes When the Knowledge Layer Is Persistent
Genesis was built on a different premise: an enterprise's institutional knowledge, meaning how its data is structured, how it connects, how it is governed, and what it means to the business, should be modeled once and made durable, not reconstructed on every run. That premise is what the Genesis Context Graph does in practice. It gives agents contextual awareness of enterprise systems, workflows, governance policies, and business semantics, autonomously extracted and then refined by human experts. It turns institutional knowledge that was previously scattered across the data estate into a persistent operational asset. Agents do not rediscover the environment each time they wake up. They operate against a standing map of it. We go deeper on how this works in our post on context-first agent design.
Two design choices compound that advantage.
- Native deployment. Genesis agents run inside an organization's own cloud environment, whether that is Snowflake, Databricks, AWS, Google BigQuery, Azure, or on-prem Kubernetes, wherever the data already lives. Nothing is shipped out to an external orchestration layer to be processed and shipped back. This is precisely the instinct PNC is now building by hand: keep the work close to the data, and keep the data in-house. Genesis ships that as the default, not as a retrofit.
- Pretrained, purpose-built agents. Because the agents are built specifically for data engineering, teams are not paying frontier-model premiums to brute-force a specialized task through a general assistant with heavy prompting and heavy token usage. It is the same right-size-the-tool logic CIBC is now applying manually through its task classifier, except here it lives in the architecture itself rather than in a routing rule someone has to build and maintain.
The proof points here are, fittingly, lean data teams rather than banks with unlimited budgets. When a major acquisition tripled the migration backlog at GrowthZone, the company's Director of Data Services shelved a $400,000 hiring plan and deployed Genesis instead, taking the same four-person team from roughly 10 migrations a year to 30 to 50. Read the full GrowthZone case study here. Neither story is about token rationing. Both are about what happens when a system stops paying to re-learn what it already knew.
Be honest about what this does and does not do
Tokens are not free, and there is no point pretending otherwise. Agentic data engineering consumes compute, and anyone who says it does not is selling something. The point of naming tokenflation is not that spend should be zero; it’s that spend should be proportional to the value created, predictable enough to plan around, and free of the waste that comes from re-deriving the same context over and over.
Every bank in this story is, one workaround at a time, reinventing what a native, context-aware, purpose-built agent platform already provides. The faster an organization grows its AI footprint, the sooner that math catches up with it.
If tokenflation is showing up in a data estate, the real question is not how to ration the way out of it– it’s whether the agents doing the work are paying to learn the environment every single time they run.
Frequently Asked Questions
What is tokenflation?
Tokenflation refers to AI token costs rising faster than expected as adoption scales, because agentic workflows consume far more tokens per task than simple chat-style use cases. Banks including RBC and JPMorgan have reported sharp token cost increases as they moved from basic AI tools to more autonomous, multi-step agents.
Why do AI token costs rise faster than usage?
Costs rise nonlinearly because agentic tasks involve reasoning steps, tool calls, and large context windows, none of which scale one-to-one with the number of requests. A single agentic task can consume tens or hundreds of thousands of tokens compared to roughly a thousand for a simple summarization.
Can model routing alone fix rising AI token costs?
Model routing reduces the cost of using an expensive model for a simple task, but it does not address the cost of an agent re-learning an enterprise's data environment, including schemas, lineage, and governance rules, on every single run. That context-reloading cost persists regardless of which model handles the request.
What is the Genesis Context Graph?
The Genesis Context Graph is a persistent map of an enterprise's data systems, workflows, governance policies, and business semantics. It is built once, refined by human experts, and referenced by agents on every run, so agents do not need to rediscover the environment from scratch each time.
How did GrowthZone scale its data migrations without new hires?
GrowthZone's four-person data engineering team faced a tripling of migration volume after an acquisition. Rather than hiring two to three additional engineers at an estimated $300,000 to $450,000 a year, the company deployed Genesis and increased annual migration capacity from roughly 10 to 30 to 50, using the same headcount.
.png)
%25201%2520(1).jpeg)
.jpeg)
.avif)
.jpg)

.jpeg)
.png)
.png)
.png)
.png)
.png)
.jpeg)
.jpeg)
.jpeg)
.jpg)
%2520(1).png)
.jpg)


%25201%2520(1).jpeg)

.jpg)
.jpeg)
.png)
.jpeg)
.png)
.jpg)














![Agent Server [1/3]: Where Enterprise AI Agents Live, Work, and Scale](https://cdn.prod.website-files.com/67bef0c56c3781a827a0f375/69c14b6f967d2ae5279adcea_690e4d0f068d3ec27aea7ae0_123%2520(1).png)
![Agent Server [2/3]: Where Should Your Agent Server Run?](https://cdn.prod.website-files.com/67bef0c56c3781a827a0f375/69c14b6f967d2ae5279adcf0_690e646b6e0366d090fbc37f_wdxczxgr-1.png)
![Agent Server [3/3]: Agent Access Control Explained: RBAC, Caller Limits, and Safer A2A](https://cdn.prod.website-files.com/67bef0c56c3781a827a0f375/69c14b56c87a1735a82bac8d_69132a45740300abc320bc7f_Cover_%2520RBAC%2520for%2520Agents%252C%2520Done%2520Right2%2520(1).png)





.png)
.png)



.jpeg)