October 20, 2025

Context Management: The Hardest Problem in Long-Running Agents

Justin Langseth

Chief Technology Officer

‍

If you’ve ever built an agent that runs beyond a few simple steps, you’ve seen what happens when it starts to lose the thread. It repeats itself, makes contradictory decisions, or begins solving problems it already closed earlier in the workflow. That isn’t the model being creative or unpredictable — it’s the model running out of working memory.

In Part 1 — Blueprints, I wrote about how we give agents a method to follow, a sequence of reasoning steps that mirror how real engineers work. Blueprints solve the process problem. But once an agent starts executing those blueprints for hours at a time — building pipelines, transforming data, or mapping fields — it hits the next bottleneck: memory.

Every modern LLM, no matter how large, still has a sharply limited working window — roughly 200,000 tokens on average. That might sound like a lot, but it vanishes fast when every step involves long SQL traces, tool outputs, or multi-line logs. Once that memory fills up, the model starts to blur together what’s current and what’s old. It begins to improvise, not because it’s wrong, but because we’ve overloaded it with context it can no longer distinguish. Most of what engineers call “hallucination” in long agent runs is really just this: context overflow disguised as reasoning failure.

Each agent call is an isolated event. The model itself doesn’t remember what happened before; it only knows what we include in that single payload. So the challenge isn’t about making the model smarter — it’s about deciding what portion of the project’s history to reload for each step so it has just enough information to continue correctly. We call this balance the Goldilocks context—the minimal set of facts the agent needs to do its job, no more and no less.

‍

At Genesis, every agent call starts with a reconstructed context built dynamically from the blueprint’s current state. If the next task is to validate a Snowpark function, the agent doesn’t replay the entire project history. Instead, it receives the function code, the relevant inputs and expected outputs, a compact summary of recent activity, and links to any supporting Markdown notes or Git files. This way, the model’s working memory is clean, targeted, and immediately relevant — just as a human engineer keeps only a few key ideas in mind when focusing on one piece of code.

When a blueprint runs for hundreds of steps, carrying every prior message forward becomes impossible. Rather than feeding the full conversation back into the model, we compress it into structured summaries and persist the full raw data in Markdown. Those summaries capture the essential decisions, the state of variables, and any unresolved issues.

Later, when the agent needs to revisit something it did earlier, it doesn’t rely on recall from memory. It simply reopens its own notes, just as an engineer would scroll through an old notebook before resuming a project. That simple act — documenting as it goes — turns the agent’s workflow into something modular, recoverable, and inspectable by humans.

Not every step benefits from shared context. Some problems require isolation and a smaller mental space. When the agent reaches a deep, high-concentration task—such as writing a transformation or debugging a specific function — we spawn a subthread, a temporary working environment with only the essential tools and inputs. It’s the “locked room” concept: a quiet space where the agent can reason clearly without distractions from previous steps. Once the task is complete, the subthread merges its results and metadata back into the main thread, preserving continuity without carrying unnecessary weight.

Compression alone keeps the system efficient, but it doesn’t solve relevance. Agents must also know how to retrieve what matters. Each one maintains awareness of the files, tables, and summaries it has produced, and when a future step refers to them, it can re-load the specific resource from storage. This keeps working memory small but restores continuity at the exact point needed. The model doesn’t need to remember everything — it only needs to know where to look.

Every model handles memory differently. Some can sustain large windows but lose precision as they fill up; others reason more accurately within smaller slices. We continuously adjust compression and recall strategies for each model family to maintain a consistent quality of reasoning. Context management is therefore not a fixed algorithm — it’s an evolving discipline that changes with each generation of LLMs.

All of this happens behind the scenes. Engineers using the system don’t have to think about token budgets or prompt truncation. The platform handles it automatically, keeping the agent focused and coherent over long sessions. From the user’s point of view, the agent simply “remembers what matters.” From ours, it’s a disciplined orchestration of context refresh and recall happening continuously under the hood.

Blueprints define what an agent should do. Context management ensures it can keep doing it long after most systems would have lost coherence. When you get context wrong, the blueprint collapses after a few steps. When you get it right, the agent can run for hours, build complete pipelines, and stay logically consistent from start to finish. That’s the real engineering challenge in agentic systems today — not generating text, but sustaining thought.

Keep Reading

Genesis Walkthrough #2: Loading data from S3 into Snowflake with Genesis

Blueprints: How We Teach Agents to Work the Way Data Engineers Do

Agent Server [1/3]: Where Enterprise AI Agents Live, Work, and Scale

How Hard Could It Be? A Tale of Building an Enterprise Agentic Data Engineering Platform

View All Articles

April 27, 2026

From "Something's Broken" to Root Cause in 5 Minutes

No items found.

April 23, 2026

40 Minutes to Reverse-Engineer a Legacy Data Warehouse (Including the Ghost Artifacts Nobody Knew Existed)

No items found.

April 22, 2026

From Raw Claims Data to a Live Analytics Dashboard in 7 Minutes

No items found.

April 20, 2026

Meet Genesis Twin: The Digital Twin That Ends the Monday Morning Data Fire Drill

No items found.

April 9, 2026

Super Data Science: ML & AI Podcast with Jon Krohn

Matt Glickman

April 8, 2026

Connecting Data Sources in Genesis

Todd Beauchene

Promotional banner for Genesis Computing

March 31, 2026

How Genesis Automates Synthetic Data Generation for Databricks Dev Environments in Under 34 Minutes

Todd Beauchene

March 19, 2026

The Death of Traditional BI - Part 1

Genesis Computing

March 11, 2026

AI Agent Builds dbt Analytics Schema in 30 Minutes

Todd Beauchene

February 26, 2026

Genesis Bronze, Silver, Gold Agentic Data Engineering: From Dashboard Sketch to Production Pipeline

Genesis Computing

February 19, 2026

How Genesis Automates Data Pipeline Development in Hours

Genesis Computing

February 12, 2026

3 cortex Codes Running in Parallel?

Justin Langseth

February 10, 2026

Powering Up Cortex Code with Genesis Superpowers

Matt Glickman

February 2, 2026

Automate Dashboard Creation with Genesis

Justin Langseth

January 27, 2026

Using AI Agents to Generate Synthetic Data

Justin Langseth

January 12, 2026

The Junior Data Engineer is Now an AI Agent

Matt Glickman

December 22, 2025

From Requirements to Production Pipelines With Genesis Missions

Genesis Computing

December 4, 2025

20 Years at Goldman Taught Me How to Manage People. Turns Out, Managing AI Agents Isn't That Different.

Anton Gorshkov

December 2, 2025

A CEO's Perspective on the Shift to AI Agents

Genesis Computing

December 2, 2025

Genesis Walkthrough #1: Exploring an S3 Bucket with Genesis Agents

Todd Beauchene

December 2, 2025

Genesis Walkthrough #2: Loading data from S3 into Snowflake with Genesis

Todd Beauchene

December 2, 2025

Genesis Walkthrough #3: Using a Blueprint to launch a mission

Todd Beauchene

December 2, 2025

Genesis Walkthrough #4: Genesis Mission prompt for required information

Todd Beauchene

December 2, 2025

Genesis Walkthrough #5: Checking in on a running mission

Todd Beauchene

December 2, 2025

Genesis Walkthrough #6: Mission document flow

Todd Beauchene

December 2, 2025

Genesis Walkthrough #7: Exploring Mission Results

Todd Beauchene

December 2, 2025

Genesis Walkthrough #8: DBT Engineering Blueprint

Todd Beauchene

November 7, 2025

Exploring Genesis UI: Agents & Their Tool

Todd Beauchene

November 7, 2025

Launching the Genesis App through the Snowflake Marketplace

Todd Beauchene

November 7, 2025

Exploring Mission Features in Genesis UI

Todd Beauchene

November 6, 2025

How Hard Could It Be? A Tale of Building an Enterprise Agentic Data Engineering Platform

Anton Gorshkov

November 4, 2025

Better Together: Genesis and Snowflake Cortex Agents API Integration

Genesis Computing

October 31, 2025

Exploring Genesis UI: Agent Workflows

Todd Beauchene

October 27, 2025

Agent Server [1/3]: Where Enterprise AI Agents Live, Work, and Scale

Justin Langseth

October 27, 2025

Agent Server [2/3]: Where Should Your Agent Server Run?

Justin Langseth

October 27, 2025

Agent Server [3/3]: Agent Access Control Explained: RBAC, Caller Limits, and Safer A2A

Justin Langseth

October 26, 2025

Delivering on agentic potential: how can financial services firms develop agents to add real value?

No items found.

October 20, 2025

Blueprints: How We Teach Agents to Work the Way Data Engineers Do

Justin Langseth

October 20, 2025

Context Management: The Hardest Problem in Long-Running Agents

Justin Langseth

October 20, 2025

Progressive Tool Use

Genesis Computing

August 22, 2025

Your Data Backlog Isn't Just a List — It's a Risk Ledger

Genesis Computing

August 14, 2025

The Future of Data Engineering: From Months to Hours with Agentic AI

Genesis Computing

Matt Glickman gives an interview at Snowflake Summit 2025

June 27, 2025

Ex-Snowflake execs launch Genesis Computing to ease data pipeline burnout with AI agents

No items found.

June 25, 2025

GXS Uses Autonomous AI Agents to Speed Data Engineering from Months to Hours

No items found.

June 5, 2025

Enterprise AI Data Agents: Automating Bronze Layer to Snowflake dbt Pipelines

No items found.

June 4, 2025

Stefan Williams, Snowflake & Matt Glickman, Genesis Computing | Snowflake Summit 2025

No items found.

The Evolution of Data Work: Introducing Agentic Data Engineering

Matt Glickman

Justin Langseth

Stay Connected!

Discover the latest breakthroughs, insights, and company news. Join our community to be the first to learn what’s coming next.

Justin Langseth

Context Management: The Hardest Problem in Long-Running Agents

Want to learn more? Get in touch!

Keep Reading

Keep Reading