Justin Langseth

Chief Technology Officer
LinkedIn
At Snowflake, Justin helped launch the data marketplace and worked on the AI strategy. Before that, he co-founded and led several companies, including Zoomdata and Clarabridge. He holds 51 technology patents related to data sharing, protection, and analysis. He graduated from MIT with a degree in Management of Information Technology.
January 27, 2026

Using AI Agents to Generate Synthetic Data

Justin Langseth
Chief Technology Officer
Keep Reading
See all
Genesis Computing Recognised in Gartner's "Data Engineering 2.0" Research
Gartner Names Genesis Computing as a Recommended Vendor. Here's What That Means for Your AI Roadmap.
Why AI Agents That Have Context First Build Better Pipelines
What’s Actually Blocking Agentic Commerce for CPGs? Not AI. The Data Pipeline.
What Does $17.4M in Undetected Royalty Exposure Look Like? Eight Platforms. Fifty Titles. Zero Unified View.
From "Something's Broken" to Root Cause in 5 Minutes
40 Minutes to Reverse-Engineer a Legacy Data Warehouse (Including the Ghost Artifacts Nobody Knew Existed)
Meet Genesis Twin: The Digital Twin That Ends the Monday Morning Data Fire Drill
From Raw Claims Data to a Live Analytics Dashboard in 7 Minutes
Super Data Science: ML & AI Podcast with Jon Krohn
Connecting Data Sources in Genesis
The Death of Traditional BI - Part 1
Exploring Genesis UI: Agent Workflows
Exploring Genesis UI: Agents & Their Tool
Launching the Genesis App through the Snowflake Marketplace
Exploring Mission Features in Genesis UI
Delivering on agentic potential: how can financial services firms develop agents to add real value?
GXS Uses Autonomous AI Agents to Speed Data Engineering from Months to Hours
Enterprise AI Data Agents: Automating Bronze Layer to Snowflake dbt Pipelines
Stefan Williams, Snowflake & Matt Glickman, Genesis Computing | Snowflake Summit 2025
A CEO's Perspective on the Shift to AI Agents
Genesis Walkthrough #1: Exploring an S3 Bucket with Genesis Agents
Genesis Walkthrough #2: Loading data from S3 into Snowflake with Genesis
Genesis Walkthrough #3: Using a Blueprint to launch a mission
Genesis Walkthrough #4: Genesis Mission prompt for required information
Genesis Walkthrough #5: Checking in on a running mission
Genesis Walkthrough #6: Mission document flow
Genesis Walkthrough #7: Exploring Mission Results
Genesis Walkthrough #8: DBT Engineering Blueprint
From Requirements to Production Pipelines With Genesis Missions
Promotional banner for Genesis Computing
Matt Glickman gives an interview at Snowflake Summit 2025
The Future of Data Engineering: From Months to Hours with Agentic AI
Your Data Backlog Isn't Just a List — It's a Risk Ledger
Blueprints: How We Teach Agents to Work the Way Data Engineers Do
Context Management: The Hardest Problem in Long-Running Agents
Progressive Tool Use
Better Together: Genesis and Snowflake Cortex Agents API Integration
How Hard Could It Be? A Tale of Building an Enterprise Agentic Data Engineering Platform
20 Years at Goldman Taught Me How to Manage People. Turns Out, Managing AI Agents Isn't That Different.
Agent Server [1/3]: Where Enterprise AI Agents Live, Work, and Scale
Agent Server [2/3]: Where Should Your Agent Server Run?
Agent Server [3/3]: Agent Access Control Explained: RBAC, Caller Limits, and Safer A2A
The Junior Data Engineer is Now an AI Agent
Using AI Agents to Generate Synthetic Data
Automate Dashboard Creation with Genesis
Powering Up Cortex Code with Genesis Superpowers
3 Cortex Codes Running in Parallel?
How Genesis Automates Data Pipeline Development in Hours
Genesis Bronze, Silver, Gold Agentic Data Engineering: From Dashboard Sketch to Production Pipeline
The Evolution of Data Work: Introducing Agentic Data Engineering
AI Agent Builds dbt Analytics Schema in 30 Minutes
Replay
Stay in the Fast Lane
News and product updates in Agentic AI for enterprise data teams.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

TL;DR: Genesis Data Agents automate the process of building realistic test data through a six-phase blueprint. The agent reads your business requirements, designs a schema with realistic data patterns, writes and tests the generation code, documents everything, and produces a structured handoff for the next pipeline phase. In a live asset management example, it completed the full workflow in 137 minutes and delivered a raw schema ready for bronze, silver, and gold medallion processing.

What Is Synthetic Data Generation?

Synthetic data plays a critical role in modern data environments. It enables teams to test pipelines, validate models, and experiment safely without exposing sensitive or regulated information. Genesis uses data agents to make the creation of high-quality synthetic data faster, repeatable, and governed.

Instead of manually crafting sample datasets or relying on brittle scripts, Genesis agents understand the structure and intent of your data. They generate synthetic datasets that preserve schema, relationships, and statistical characteristics, while removing the risk associated with real production data.

Because this work is handled by agents, synthetic data generation becomes part of a structured workflow rather than a one-off task. The agents document what they create, follow predefined standards, and can regenerate data consistently as requirements change.

This approach is especially valuable for testing, development, and validation workflows. Teams can spin up realistic datasets on demand, validate transformations across environments, and move faster without waiting on production access or anonymization processes.

Why It Matters

  • Faster testing and development without using sensitive data
  • Consistent synthetic datasets aligned with real schemas
  • Repeatable workflows that reduce manual effort
  • Safer experimentation across teams and environments

By using data agents to generate synthetic data, Genesis removes friction from one of the most time-consuming parts of the data lifecycle. Teams get realistic data when they need it, without compromising security, governance, or delivery speed.

A Real Example: Asset Management Data, Built for a Specific Dashboard

To see how this works concretely, consider a data engineering team that needs to build a dashboard for an asset management firm. The business questions driving that dashboard are specific: which funds are performing best, which are consistent across market cycles, how assets under management have grown over time, and how individual funds rank within their peer groups.

Before any of that analysis is possible, the team needs raw data that looks and behaves like real asset management data. Building that from scratch, with realistic distributions and meaningful patterns, is not trivial. If the data is too flat or too random, the downstream dashboard will be useless for development and testing.

Using Genesis, the engineer kicks off the synthetic data generation blueprint with two inputs: a list of the business questions the data needs to eventually answer, and a rough sketch of the target dashboard layout. That is the entire setup, and the agent takes it from there.

How the Synthetic Data Generation Blueprint Works

Genesis agents do not generate synthetic data in a single pass. The synthetic data generation blueprint follows six sequential phases, each with defined actions, context documents passed forward from the previous phase, and exit criteria the agent must satisfy before moving on. This is what keeps long-running autonomous work on track, the agent cannot skip ahead.

In the asset management example, the agent worked for 137 minutes, progressing through all six phases autonomously:

  1. Context understanding. The agent reads the business questions and dashboard sketch, identifies the industry domain, and proposes an initial data model -- documenting the patterns the data should exhibit so the resulting dashboard shows meaningful variation rather than flat or unrealistic outputs.
  2. Schema design. The agent proposes the tables, columns, and relationships needed to support the raw schema, with the downstream bronze, silver, and gold medallion structure already in mind.
  3. Data generation logic. The agent writes Python programs, for example, generateDataClean.py -- to produce the synthetic records, accounting for special cases and edge conditions identified during planning.
  4. Testing and validation. Generated data is tested against the schema design and the original business requirements before the agent proceeds.
  5. Documentation. The agent produces written documentation and diagrams describing what was built, why, and how it connects to the broader pipeline. This is useful for engineers, downstream agents, and anyone auditing the work later.
  6. Handoff. A structured handoff document is produced for the next phase, in this case source-to-target mapping for the medallion layers.

Genesis agents have access to over 100 tools throughout this process: writing files, executing code, running tests, creating diagrams. Engineers can also use the built-in replay capability to review exactly what the agent did, step by step, at any point during or after the mission.

What the Agent Actually Produces

At the end of the process, the team has a complete raw schema in Snowflake. In the asset management example, that included tables for trade activity, product data, and portfolio positions, structured and patterned to behave like real data from that type of company in that industry.

The final dashboard, built on top of the bronze, silver, and gold layers derived from this raw data, showed realistic fund performance patterns, rankings, and assets under management trends. Synthetic data that does not behave realistically is not useful for testing; it just produces false confidence. The point of the blueprint's planning phases is to prevent exactly that.

Where Synthetic Data Generation Fits in the Pipeline

Synthetic data generated by Genesis hands off directly to the next phase of a data engineering workflow. In a typical sequence:

  1. Genesis generates the raw synthetic schema from natural language inputs and any reference documents or sketches provided.
  2. A source-to-target mapping blueprint maps the raw schema to bronze, silver, and gold medallion layers.
  3. A data engineering phase uses dbt, Snowpark, or Databricks to render the data into those layers within Snowflake. For a closer look at the dbt side of this, see AI Agent Builds dbt Analytics Schema in 30 Minutes.
  4. Dashboards or query agents are built on top of the gold layer to answer the original business questions.

The synthetic data generation step is not a workaround or a placeholder. It is the foundation that makes the rest of the pipeline possible before production data is available, and because the agent documents everything it creates, each handoff is clean.

Genesis supports this same workflow on Databricks. For a look at how it runs in that environment, see How Genesis Automates Synthetic Data Generation for Databricks Dev Environments in Under 34 Minutes.

Why Traditional Approaches Fall Short

Manual synthetic data creation tends to produce one of two outcomes: data that is too simple to surface real problems, or data that took so long to build that the team cut corners somewhere else to compensate.

Scripted approaches help with repeatability but require maintenance as schemas evolve. Anonymized production data reduces risk but introduces compliance overhead and often requires a security review before it can be used in development or shared across environments.

Neither approach generates the documentation that makes synthetic data useful beyond a single engineer's local environment. The context problem in long-running data work compounds every time a dataset changes hands without proper documentation.

Genesis treats synthetic data generation as a first-class data engineering task: documented, versioned, reproducible, and tied to the business requirements it is meant to serve.

Frequently Asked Questions

What is synthetic data generation in Genesis? An automated, six-phase workflow in which a Genesis Data Agent creates a complete schema of realistic test data inside Snowflake or Databricks. The engineer provides requirements and any reference documents; the agent handles design, generation, validation, documentation, and handoff.

How long does it take? It depends on schema complexity. The asset management example completed in 137 minutes. The Databricks version of this workflow has run in under 34 minutes for simpler schemas.

How does it connect to the rest of the pipeline? The blueprint produces a raw schema and a structured handoff document that feeds directly into source-to-target mapping, then into bronze, silver, and gold medallion processing, and finally into dashboards or query agents.

What is a Genesis blueprint? A structured methodology that defines how a Genesis agent approaches a specific category of work, broken into sequential phases with defined actions, context, and exit criteria at each step. For a deeper look, see Blueprints: How We Teach Agents to Work the Way Data Engineers Do.

Want to learn more? Get in touch!

Experience what Genesis can do for your team.
Request a Demo
Stay in the Fast Lane
News and product updates in Agentic AI for enterprise data teams.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Keep Reading

The Junior Data Engineer is Now an AI Agent
The Junior Data Engineer is Now an AI Agent
Genesis Bronze, Silver, Gold Agentic Data Engineering: From Dashboard Sketch to Production Pipeline
Genesis Bronze, Silver, Gold Agentic Data Engineering: From Dashboard Sketch to Production Pipeline
Genesis Walkthrough #3: Using a Blueprint to launch a mission
Genesis Walkthrough #3: Using a Blueprint to launch a mission
Matt Glickman gives an interview at Snowflake Summit 2025
Ex-Snowflake execs launch Genesis Computing to ease data pipeline burnout with AI agents
View All Articles
Genesis Computing Recognised in Gartner's "Data Engineering 2.0" Research
May 29, 2026
Genesis Computing Recognised in Gartner's "Data Engineering 2.0" Research
Yahoo Finance
Gartner Names Genesis Computing as a Recommended Vendor. Here's What That Means for Your AI Roadmap.
May 20, 2026
Gartner Names Genesis Computing as a Recommended Vendor. Here's What That Means for Your AI Roadmap.
Genesis Computing
Why AI Agents That Have Context First Build Better Pipelines
May 12, 2026
Why AI Agents That Have Context First Build Better Pipelines
Genesis Computing
What’s Actually Blocking Agentic Commerce for CPGs? Not AI. The Data Pipeline.
May 5, 2026
What’s Actually Blocking Agentic Commerce for CPGs? Not AI. The Data Pipeline.
Genesis Computing
What Does $17.4M in Undetected Royalty Exposure Look Like? Eight Platforms. Fifty Titles. Zero Unified View.
May 5, 2026
What Does $17.4M in Undetected Royalty Exposure Look Like? Eight Platforms. Fifty Titles. Zero Unified View.
Genesis Computing
From "Something's Broken" to Root Cause in 5 Minutes
April 27, 2026
From "Something's Broken" to Root Cause in 5 Minutes
No items found.
No items found.
40 Minutes to Reverse-Engineer a Legacy Data Warehouse (Including the Ghost Artifacts Nobody Knew Existed)
April 23, 2026
40 Minutes to Reverse-Engineer a Legacy Data Warehouse (Including the Ghost Artifacts Nobody Knew Existed)
Genesis Computing
From Raw Claims Data to a Live Analytics Dashboard in 7 Minutes
April 22, 2026
From Raw Claims Data to a Live Analytics Dashboard in 7 Minutes
Genesis Computing
Meet Genesis Twin: The Digital Twin That Ends the Monday Morning Data Fire Drill
April 20, 2026
Meet Genesis Twin: The Digital Twin That Ends the Monday Morning Data Fire Drill
Genesis Computing
Super Data Science: ML & AI Podcast with Jon Krohn
April 9, 2026
Super Data Science: ML & AI Podcast with Jon Krohn
Matt Glickman
Connecting Data Sources in Genesis
April 8, 2026
Connecting Data Sources in Genesis
Todd Beauchene
Promotional banner for Genesis Computing
March 31, 2026
How Genesis Automates Synthetic Data Generation for Databricks Dev Environments in Under 34 Minutes
Todd Beauchene
The Death of Traditional BI - Part 1
March 19, 2026
The Death of Traditional BI - Part 1
Genesis Computing
AI Agent Builds dbt Analytics Schema in 30 Minutes
March 11, 2026
AI Agent Builds dbt Analytics Schema in 30 Minutes
Todd Beauchene
Genesis Bronze, Silver, Gold Agentic Data Engineering: From Dashboard Sketch to Production Pipeline
February 26, 2026
Genesis Bronze, Silver, Gold Agentic Data Engineering: From Dashboard Sketch to Production Pipeline
Genesis Computing
How Genesis Automates Data Pipeline Development in Hours
February 19, 2026
How Genesis Automates Data Pipeline Development in Hours
Genesis Computing
3 Cortex Codes Running in Parallel?
February 12, 2026
3 Cortex Codes Running in Parallel?
Justin Langseth
Powering Up Cortex Code with Genesis Superpowers
February 10, 2026
Powering Up Cortex Code with Genesis Superpowers
Matt Glickman
Automate Dashboard Creation with Genesis
February 2, 2026
Automate Dashboard Creation with Genesis
Justin Langseth
Using AI Agents to Generate Synthetic Data
January 27, 2026
Using AI Agents to Generate Synthetic Data
Justin Langseth
The Junior Data Engineer is Now an AI Agent
January 12, 2026
The Junior Data Engineer is Now an AI Agent
Matt Glickman
From Requirements to Production Pipelines With Genesis Missions
December 22, 2025
From Requirements to Production Pipelines With Genesis Missions
Genesis Computing
20 Years at Goldman Taught Me How to Manage People. Turns Out, Managing AI Agents Isn't That Different.
December 4, 2025
20 Years at Goldman Taught Me How to Manage People. Turns Out, Managing AI Agents Isn't That Different.
Anton Gorshkov
A CEO's Perspective on the Shift to AI Agents
December 2, 2025
A CEO's Perspective on the Shift to AI Agents
Genesis Computing
Genesis Walkthrough #1: Exploring an S3 Bucket with Genesis Agents
December 2, 2025
Genesis Walkthrough #1: Exploring an S3 Bucket with Genesis Agents
Todd Beauchene
Genesis Walkthrough #2: Loading data from S3 into Snowflake with Genesis
December 2, 2025
Genesis Walkthrough #2: Loading data from S3 into Snowflake with Genesis
Todd Beauchene
Genesis Walkthrough #3: Using a Blueprint to launch a mission
December 2, 2025
Genesis Walkthrough #3: Using a Blueprint to launch a mission
Todd Beauchene
Genesis Walkthrough #4: Genesis Mission prompt for required information
December 2, 2025
Genesis Walkthrough #4: Genesis Mission prompt for required information
Todd Beauchene
Genesis Walkthrough #5: Checking in on a running mission
December 2, 2025
Genesis Walkthrough #5: Checking in on a running mission
Todd Beauchene
Genesis Walkthrough #6: Mission document flow
December 2, 2025
Genesis Walkthrough #6: Mission document flow
Todd Beauchene
Genesis Walkthrough #7: Exploring Mission Results
December 2, 2025
Genesis Walkthrough #7: Exploring Mission Results
Todd Beauchene
Genesis Walkthrough #8: DBT Engineering Blueprint
December 2, 2025
Genesis Walkthrough #8: DBT Engineering Blueprint
Todd Beauchene
Exploring Genesis UI: Agents & Their Tool
November 7, 2025
Exploring Genesis UI: Agents & Their Tool
Todd Beauchene
Launching the Genesis App through the Snowflake Marketplace
November 7, 2025
Launching the Genesis App through the Snowflake Marketplace
Todd Beauchene
Exploring Mission Features in Genesis UI
November 7, 2025
Exploring Mission Features in Genesis UI
Todd Beauchene
How Hard Could It Be? A Tale of Building an Enterprise Agentic Data Engineering Platform
November 6, 2025
How Hard Could It Be? A Tale of Building an Enterprise Agentic Data Engineering Platform
Anton Gorshkov
Better Together: Genesis and Snowflake Cortex Agents API Integration
November 4, 2025
Better Together: Genesis and Snowflake Cortex Agents API Integration
Genesis Computing
Exploring Genesis UI: Agent Workflows
October 31, 2025
Exploring Genesis UI: Agent Workflows
Todd Beauchene
Agent Server [1/3]: Where Enterprise AI Agents Live, Work, and Scale
October 27, 2025
Agent Server [1/3]: Where Enterprise AI Agents Live, Work, and Scale
Justin Langseth
Agent Server [2/3]: Where Should Your Agent Server Run?
October 27, 2025
Agent Server [2/3]: Where Should Your Agent Server Run?
Justin Langseth
Agent Server [3/3]: Agent Access Control Explained: RBAC, Caller Limits, and Safer A2A
October 27, 2025
Agent Server [3/3]: Agent Access Control Explained: RBAC, Caller Limits, and Safer A2A
Justin Langseth
Delivering on agentic potential: how can financial services firms develop agents to add real value?
October 26, 2025
Delivering on agentic potential: how can financial services firms develop agents to add real value?
Genesis Computing
Blueprints: How We Teach Agents to Work the Way Data Engineers Do
October 20, 2025
Blueprints: How We Teach Agents to Work the Way Data Engineers Do
Justin Langseth
Context Management: The Hardest Problem in Long-Running Agents
October 20, 2025
Context Management: The Hardest Problem in Long-Running Agents
Justin Langseth
Progressive Tool Use
October 20, 2025
Progressive Tool Use
Genesis Computing
Your Data Backlog Isn't Just a List — It's a Risk Ledger
August 22, 2025
Your Data Backlog Isn't Just a List — It's a Risk Ledger
Genesis Computing
The Future of Data Engineering: From Months to Hours with Agentic AI
August 14, 2025
The Future of Data Engineering: From Months to Hours with Agentic AI
Genesis Computing
Matt Glickman gives an interview at Snowflake Summit 2025
June 27, 2025
Ex-Snowflake execs launch Genesis Computing to ease data pipeline burnout with AI agents
Genesis Computing
GXS Uses Autonomous AI Agents to Speed Data Engineering from Months to Hours
June 25, 2025
GXS Uses Autonomous AI Agents to Speed Data Engineering from Months to Hours
Genesis Computing
Enterprise AI Data Agents: Automating Bronze Layer to Snowflake dbt Pipelines
June 5, 2025
Enterprise AI Data Agents: Automating Bronze Layer to Snowflake dbt Pipelines
Genesis Computing
Stefan Williams, Snowflake & Matt Glickman, Genesis Computing | Snowflake Summit 2025
June 4, 2025
Stefan Williams, Snowflake & Matt Glickman, Genesis Computing | Snowflake Summit 2025
Genesis Computing
The Evolution of Data Work: Introducing Agentic Data Engineering
The Evolution of Data Work: Introducing Agentic Data Engineering
Matt Glickman
Justin Langseth