January 27, 2026

Using AI Agents to Generate Synthetic Data

Justin Langseth

Chief Technology Officer

Keep Reading

See all

Genesis Computing article cover about tokenflation in enterprise AI, showing an abstract orange architectural graphic and the headline “Tokenflation is a Symptom → The Cure is Architectural.”

Genesis Computing Recognised in Gartner's "Data Engineering 2.0" Research

Why AI Agents That Have Context First Build Better Pipelines

What’s Actually Blocking Agentic Commerce for CPGs? Not AI. The Data Pipeline.

What Does $17.4M in Undetected Royalty Exposure Look Like? Eight Platforms. Fifty Titles. Zero Unified View.

From "Something's Broken" to Root Cause in 5 Minutes

40 Minutes to Reverse-Engineer a Legacy Data Warehouse (Including the Ghost Artifacts Nobody Knew Existed)

Meet Genesis Twin: The Digital Twin That Ends the Monday Morning Data Fire Drill

From Raw Claims Data to a Live Analytics Dashboard in 7 Minutes

Super Data Science: ML & AI Podcast with Jon Krohn

Exploring Genesis UI: Agents & Their Tool

Launching the Genesis App through the Snowflake Marketplace

Exploring Mission Features in Genesis UI

Delivering on agentic potential: how can financial services firms develop agents to add real value?

GXS Uses Autonomous AI Agents to Speed Data Engineering from Months to Hours

Enterprise AI Data Agents: Automating Bronze Layer to Snowflake dbt Pipelines

Stefan Williams, Snowflake & Matt Glickman, Genesis Computing | Snowflake Summit 2025

A CEO's Perspective on the Shift to AI Agents

Genesis Walkthrough #1: Exploring an S3 Bucket with Genesis Agents

Genesis Walkthrough #2: Loading data from S3 into Snowflake with Genesis

Genesis Walkthrough #3: Using a Blueprint to launch a mission

Genesis Walkthrough #4: Genesis Mission prompt for required information

Genesis Walkthrough #5: Checking in on a running mission

Genesis Walkthrough #6: Mission document flow

Genesis Walkthrough #7: Exploring Mission Results

Genesis Walkthrough #8: DBT Engineering Blueprint

From Requirements to Production Pipelines With Genesis Missions

Promotional banner for Genesis Computing

Matt Glickman gives an interview at Snowflake Summit 2025

The Future of Data Engineering: From Months to Hours with Agentic AI

Your Data Backlog Isn't Just a List — It's a Risk Ledger

Blueprints: How We Teach Agents to Work the Way Data Engineers Do

Context Management: The Hardest Problem in Long-Running Agents

Better Together: Genesis and Snowflake Cortex Agents API Integration

How Hard Could It Be? A Tale of Building an Enterprise Agentic Data Engineering Platform

20 Years at Goldman Taught Me How to Manage People. Turns Out, Managing AI Agents Isn't That Different.

Agent Server [1/3]: Where Enterprise AI Agents Live, Work, and Scale

Agent Server [2/3]: Where Should Your Agent Server Run?

Agent Server [3/3]: Agent Access Control Explained: RBAC, Caller Limits, and Safer A2A

The Junior Data Engineer is Now an AI Agent

Using AI Agents to Generate Synthetic Data

Automate Dashboard Creation with Genesis

How Genesis Automates Data Pipeline Development in Hours

Genesis Bronze, Silver, Gold Agentic Data Engineering: From Dashboard Sketch to Production Pipeline

The Evolution of Data Work: Introducing Agentic Data Engineering

AI Agent Builds dbt Analytics Schema in 30 Minutes

Replay

‍‍

TL;DR: Genesis Data Agents automate the process of building realistic test data through a six-phase blueprint. The agent reads your business requirements, designs a schema with realistic data patterns, writes and tests the generation code, documents everything, and produces a structured handoff for the next pipeline phase. In a live asset management example, it completed the full workflow in 137 minutes and delivered a raw schema ready for bronze, silver, and gold medallion processing.

What Is Synthetic Data Generation?

Synthetic data plays a critical role in modern data environments. It enables teams to test pipelines, validate models, and experiment safely without exposing sensitive or regulated information. Genesis uses data agents to make the creation of high-quality synthetic data faster, repeatable, and governed.

Instead of manually crafting sample datasets or relying on brittle scripts, Genesis agents understand the structure and intent of your data. They generate synthetic datasets that preserve schema, relationships, and statistical characteristics, while removing the risk associated with real production data.

Because this work is handled by agents, synthetic data generation becomes part of a structured workflow rather than a one-off task. The agents document what they create, follow predefined standards, and can regenerate data consistently as requirements change.

This approach is especially valuable for testing, development, and validation workflows. Teams can spin up realistic datasets on demand, validate transformations across environments, and move faster without waiting on production access or anonymization processes.

Why It Matters

Faster testing and development without using sensitive data
Consistent synthetic datasets aligned with real schemas
Repeatable workflows that reduce manual effort
Safer experimentation across teams and environments

By using data agents to generate synthetic data, Genesis removes friction from one of the most time-consuming parts of the data lifecycle. Teams get realistic data when they need it, without compromising security, governance, or delivery speed.

A Real Example: Asset Management Data, Built for a Specific Dashboard

To see how this works concretely, consider a data engineering team that needs to build a dashboard for an asset management firm. The business questions driving that dashboard are specific: which funds are performing best, which are consistent across market cycles, how assets under management have grown over time, and how individual funds rank within their peer groups.

Before any of that analysis is possible, the team needs raw data that looks and behaves like real asset management data. Building that from scratch, with realistic distributions and meaningful patterns, is not trivial. If the data is too flat or too random, the downstream dashboard will be useless for development and testing.

Using Genesis, the engineer kicks off the synthetic data generation blueprint with two inputs: a list of the business questions the data needs to eventually answer, and a rough sketch of the target dashboard layout. That is the entire setup, and the agent takes it from there.

How the Synthetic Data Generation Blueprint Works

Genesis agents do not generate synthetic data in a single pass. The synthetic data generation blueprint follows six sequential phases, each with defined actions, context documents passed forward from the previous phase, and exit criteria the agent must satisfy before moving on. This is what keeps long-running autonomous work on track, the agent cannot skip ahead.

In the asset management example, the agent worked for 137 minutes, progressing through all six phases autonomously:

Context understanding. The agent reads the business questions and dashboard sketch, identifies the industry domain, and proposes an initial data model -- documenting the patterns the data should exhibit so the resulting dashboard shows meaningful variation rather than flat or unrealistic outputs.
Schema design. The agent proposes the tables, columns, and relationships needed to support the raw schema, with the downstream bronze, silver, and gold medallion structure already in mind.
Data generation logic. The agent writes Python programs, for example, generateDataClean.py -- to produce the synthetic records, accounting for special cases and edge conditions identified during planning.
Testing and validation. Generated data is tested against the schema design and the original business requirements before the agent proceeds.
Documentation. The agent produces written documentation and diagrams describing what was built, why, and how it connects to the broader pipeline. This is useful for engineers, downstream agents, and anyone auditing the work later.
Handoff. A structured handoff document is produced for the next phase, in this case source-to-target mapping for the medallion layers.

Genesis agents have access to over 100 tools throughout this process: writing files, executing code, running tests, creating diagrams. Engineers can also use the built-in replay capability to review exactly what the agent did, step by step, at any point during or after the mission.

What the Agent Actually Produces

At the end of the process, the team has a complete raw schema in Snowflake. In the asset management example, that included tables for trade activity, product data, and portfolio positions, structured and patterned to behave like real data from that type of company in that industry.

The final dashboard, built on top of the bronze, silver, and gold layers derived from this raw data, showed realistic fund performance patterns, rankings, and assets under management trends. Synthetic data that does not behave realistically is not useful for testing; it just produces false confidence. The point of the blueprint's planning phases is to prevent exactly that.

Where Synthetic Data Generation Fits in the Pipeline

Synthetic data generated by Genesis hands off directly to the next phase of a data engineering workflow. In a typical sequence:

Genesis generates the raw synthetic schema from natural language inputs and any reference documents or sketches provided.
A source-to-target mapping blueprint maps the raw schema to bronze, silver, and gold medallion layers.
A data engineering phase uses dbt, Snowpark, or Databricks to render the data into those layers within Snowflake. For a closer look at the dbt side of this, see AI Agent Builds dbt Analytics Schema in 30 Minutes.
Dashboards or query agents are built on top of the gold layer to answer the original business questions.

The synthetic data generation step is not a workaround or a placeholder. It is the foundation that makes the rest of the pipeline possible before production data is available, and because the agent documents everything it creates, each handoff is clean.

Genesis supports this same workflow on Databricks. For a look at how it runs in that environment, see How Genesis Automates Synthetic Data Generation for Databricks Dev Environments in Under 34 Minutes.

Why Traditional Approaches Fall Short

Manual synthetic data creation tends to produce one of two outcomes: data that is too simple to surface real problems, or data that took so long to build that the team cut corners somewhere else to compensate.

Scripted approaches help with repeatability but require maintenance as schemas evolve. Anonymized production data reduces risk but introduces compliance overhead and often requires a security review before it can be used in development or shared across environments.

Neither approach generates the documentation that makes synthetic data useful beyond a single engineer's local environment. The context problem in long-running data work compounds every time a dataset changes hands without proper documentation.

Genesis treats synthetic data generation as a first-class data engineering task: documented, versioned, reproducible, and tied to the business requirements it is meant to serve.

Frequently Asked Questions

What is synthetic data generation in Genesis? An automated, six-phase workflow in which a Genesis Data Agent creates a complete schema of realistic test data inside Snowflake or Databricks. The engineer provides requirements and any reference documents; the agent handles design, generation, validation, documentation, and handoff.

How long does it take? It depends on schema complexity. The asset management example completed in 137 minutes. The Databricks version of this workflow has run in under 34 minutes for simpler schemas.

How does it connect to the rest of the pipeline? The blueprint produces a raw schema and a structured handoff document that feeds directly into source-to-target mapping, then into bronze, silver, and gold medallion processing, and finally into dashboards or query agents.

What is a Genesis blueprint? A structured methodology that defines how a Genesis agent approaches a specific category of work, broken into sequential phases with defined actions, context, and exit criteria at each step. For a deeper look, see Blueprints: How We Teach Agents to Work the Way Data Engineers Do.

Keep Reading

The Junior Data Engineer is Now an AI Agent

Genesis Bronze, Silver, Gold Agentic Data Engineering: From Dashboard Sketch to Production Pipeline

Genesis Walkthrough #3: Using a Blueprint to launch a mission

Ex-Snowflake execs launch Genesis Computing to ease data pipeline burnout with AI agents

View All Articles

July 21, 2026

A Living Memory of Your Enterprise Context — The Genesis Context Graph

Anton Gorshkov

July 16, 2026

The Agentic Control Plane for Data Engineering

Genesis Computing

July 14, 2026

Your Enterprise Data Engineering Agents Need RBAC

Anton Gorshkov

July 9, 2026

How Genesis Missions Collapse Enterprise Data Work From Months to Hours

Anton Gorshkov

July 2, 2026

How Genesis Blueprints Make AI Outcomes Repeatable

Genesis Computing

June 18, 2026

Tokenflation Is a Symptom. The Cure Is Context-Aware AI Architecture

Genesis Computing

June 11, 2026

Genesis Computing Announced as Validated Technology Partner of Databricks

Yahoo Finance

May 29, 2026

Genesis Computing Recognised in Gartner's "Data Engineering 2.0" Research

Yahoo Finance

May 12, 2026

Why AI Agents That Have Context First Build Better Pipelines

Genesis Computing

May 5, 2026

What’s Actually Blocking Agentic Commerce for CPGs? Not AI. The Data Pipeline.

Genesis Computing

May 5, 2026

What Does $17.4M in Undetected Royalty Exposure Look Like? Eight Platforms. Fifty Titles. Zero Unified View.

Genesis Computing

April 27, 2026

From "Something's Broken" to Root Cause in 5 Minutes

No items found.

April 23, 2026

40 Minutes to Reverse-Engineer a Legacy Data Warehouse (Including the Ghost Artifacts Nobody Knew Existed)

Genesis Computing

April 22, 2026

From Raw Claims Data to a Live Analytics Dashboard in 7 Minutes

Genesis Computing

April 20, 2026

Meet Genesis Twin: The Digital Twin That Ends the Monday Morning Data Fire Drill

Genesis Computing

April 9, 2026

Super Data Science: ML & AI Podcast with Jon Krohn

Matt Glickman

April 8, 2026

Connecting Data Sources in Genesis

Todd Beauchene

March 31, 2026

How Genesis Automates Synthetic Data Generation for Databricks Dev Environments in Under 34 Minutes

Todd Beauchene

March 19, 2026

The Death of Traditional BI - Part 1

Genesis Computing

March 11, 2026

AI Agent Builds dbt Analytics Schema in 30 Minutes

Todd Beauchene

February 26, 2026

Genesis Bronze, Silver, Gold Agentic Data Engineering: From Dashboard Sketch to Production Pipeline

Genesis Computing

February 19, 2026

How Genesis Automates Data Pipeline Development in Hours

Genesis Computing

February 12, 2026

3 Cortex Codes Running in Parallel?

Justin Langseth

February 10, 2026

Powering Up Cortex Code with Genesis Superpowers

Matt Glickman

February 2, 2026

Automate Dashboard Creation with Genesis

Justin Langseth

January 12, 2026

The Junior Data Engineer is Now an AI Agent

Matt Glickman

December 22, 2025

From Requirements to Production Pipelines With Genesis Missions

Genesis Computing

December 4, 2025

20 Years at Goldman Taught Me How to Manage People. Turns Out, Managing AI Agents Isn't That Different.

Anton Gorshkov

December 2, 2025

A CEO's Perspective on the Shift to AI Agents

Genesis Computing

December 2, 2025

Genesis Walkthrough #1: Exploring an S3 Bucket with Genesis Agents

Todd Beauchene

December 2, 2025

Genesis Walkthrough #2: Loading data from S3 into Snowflake with Genesis

Todd Beauchene

December 2, 2025

Genesis Walkthrough #3: Using a Blueprint to launch a mission

Todd Beauchene

December 2, 2025

Genesis Walkthrough #4: Genesis Mission prompt for required information

Todd Beauchene

December 2, 2025

Genesis Walkthrough #5: Checking in on a running mission

Todd Beauchene

December 2, 2025

Genesis Walkthrough #6: Mission document flow

Todd Beauchene

December 2, 2025

Genesis Walkthrough #7: Exploring Mission Results

Todd Beauchene

December 2, 2025

Genesis Walkthrough #8: DBT Engineering Blueprint

Todd Beauchene

November 7, 2025

Exploring Genesis UI: Agents & Their Tool

Todd Beauchene

November 7, 2025

Launching the Genesis App through the Snowflake Marketplace

Todd Beauchene

November 7, 2025

Exploring Mission Features in Genesis UI

Todd Beauchene

November 6, 2025

How Hard Could It Be? A Tale of Building an Enterprise Agentic Data Engineering Platform

Anton Gorshkov

November 4, 2025

Better Together: Genesis and Snowflake Cortex Agents API Integration

Genesis Computing

October 31, 2025

Exploring Genesis UI: Agent Workflows

Todd Beauchene

October 27, 2025

Agent Server [1/3]: Where Enterprise AI Agents Live, Work, and Scale

Justin Langseth

October 27, 2025

Agent Server [2/3]: Where Should Your Agent Server Run?

Justin Langseth

October 27, 2025

Agent Server [3/3]: Agent Access Control Explained: RBAC, Caller Limits, and Safer A2A

Justin Langseth

October 26, 2025

Delivering on agentic potential: how can financial services firms develop agents to add real value?

Genesis Computing

October 20, 2025

Blueprints: How We Teach Agents to Work the Way Data Engineers Do

Justin Langseth

October 20, 2025

Context Management: The Hardest Problem in Long-Running Agents

Justin Langseth

October 20, 2025

Progressive Tool Use

Genesis Computing

August 22, 2025

Your Data Backlog Isn't Just a List — It's a Risk Ledger

Genesis Computing

August 14, 2025

The Future of Data Engineering: From Months to Hours with Agentic AI

Genesis Computing

June 27, 2025

Ex-Snowflake execs launch Genesis Computing to ease data pipeline burnout with AI agents

Genesis Computing

June 25, 2025

GXS Uses Autonomous AI Agents to Speed Data Engineering from Months to Hours

Genesis Computing

June 5, 2025

Enterprise AI Data Agents: Automating Bronze Layer to Snowflake dbt Pipelines

Genesis Computing

June 4, 2025

Stefan Williams, Snowflake & Matt Glickman, Genesis Computing | Snowflake Summit 2025

Genesis Computing

The Evolution of Data Work: Introducing Agentic Data Engineering

Matt Glickman

Justin Langseth

Stay Connected!

Discover the latest breakthroughs, insights, and company news. Join our community to be the first to learn what’s coming next.

Justin Langseth

Using AI Agents to Generate Synthetic Data

Keep Reading

What Is Synthetic Data Generation?

Why It Matters

A Real Example: Asset Management Data, Built for a Specific Dashboard

How the Synthetic Data Generation Blueprint Works

What the Agent Actually Produces

Where Synthetic Data Generation Fits in the Pipeline

Why Traditional Approaches Fall Short

Frequently Asked Questions

Want to learn more? Get in touch!

Keep Reading

Keep Reading