Testing the Unknowable: Navigating Non-Determinism in AI-Driven Development

Introduction

In a recent conversation between Ryan and SmartBear's VP of AI and Architecture, Fitz Nowlan, the duo unpacked a pressing question for modern developers: How do you test code when you have no idea what it contains? As artificial intelligence (AI) and large language models (LLMs) increasingly generate and drive software, traditional testing methodologies break down. The heart of the issue lies in non-determinism—the inability to predict a system's output given the same input—which challenges the very foundation of software quality assurance. This article explores the shift from old assumptions about software development, the unique hurdles of testing MCP (Model Context Protocol) servers, and why data locality and data construction are becoming more valuable than ever when source code is trivial to produce.

Testing the Unknowable: Navigating Non-Determinism in AI-Driven Development — Source: stackoverflow.blog

The Changing Landscape of Software Development

For decades, software development relied on deterministic systems: the same input and code always produced the same output. Testing was predicated on this reliability—unit tests, integration tests, and end-to-end tests all assumed you could predict what a piece of code would do. But LLM-driven agents have shattered that assumption. These agents generate behavior on the fly, leveraging vast probabilistic models to decide actions, responses, and even the code itself. Now, the source code may be a black box or generated dynamically, making it impossible to know its exact contents at test time.

We are moving away from the notion that we need to understand every line of code to ensure quality. Instead, the emphasis is shifting to understanding what the system should do, not how it does it. This paradigm demands a new testing mentality—one that embraces uncertainty and focuses on outcomes and constraints.

The Challenge of Testing MCP Servers with Non-Deterministic Agents

One of the most pressing challenges is testing MCP servers. MCP (Model Context Protocol) servers act as middleware, connecting LLM agents to external tools, data sources, and services. They facilitate real-time interactions where an agent might call a function, query a database, or trigger an API—all based on ambiguous user requests. Because the LLM agent introduces non-determinism, the same query can lead to different actions each time.

Why Traditional Testing Fails

Traditional testing frameworks rely on predictable state transitions: given state A and input X, the system should move to state B. But nondeterministic agents don't follow that script. An agent's internal prompt history, token sampling, and randomness mean that a test that passes once may fail the next time—not because of a bug, but because the agent chose a different valid path. This makes regression testing, mocking, and assertion-based approaches nearly impossible.

Adapting to Non-Determinism

SmartBear's Fitz Nowlan suggests a shift in approach: instead of testing for exact outputs, test for constraints and invariants. For example, verify that any database write is valid, or that the agent never calls unauthorized APIs. This property-based testing logic helps ensure the system remains safe and functional despite the agent's unpredictable behavior.

Another strategy is to leverage contract testing between the MCP server and the agent. Define a set of permitted operations, data formats, and response types. The test then validates that the agent stays within those boundaries, rather than verifying a specific call flow. This approach acknowledges the agent's freedom while enforcing guardrails.

The New Value Proposition: Data Locality and Data Construction

When source code is cheap to generate—thanks to AI coding assistants—the bottleneck shifts from writing code to ensuring its quality. But without deterministic expectations, what can we rely on? According to Nowlan, data locality and data construction emerge as critical assets.

Data Locality

Data locality refers to the practice of keeping data close to where it's processed, reducing latency and increasing control. In an AI-driven ecosystem, where agents reach out to various services, data locality ensures that tests can run against representative, real-world data rather than synthetic datasets that may not reflect actual usage patterns. By situating test data within the same environment as the code (or the agent), engineers can catch anomalies early without relying on an exact code path.

Data Construction

Equally important is data construction—the deliberate craft of building test data that exercises both typical and edge-case scenarios. Since we cannot predict the exact operations an agent will perform, we must instead design data that tests the system's resilience. For instance, constructing data with missing fields, unexpected values, or boundary conditions helps verify that the MCP server and its agent handle ambiguity gracefully. This becomes a more valuable craft than writing unit tests for each function, because it tests the system's behavior under all possible agent choices.

Together, data locality and data construction provide a foundation for meaningful testing in the age of non-determinism. Rather than fighting the randomness, we embrace it by anchoring our tests in the ground truth of real-world data patterns.

Conclusion

The conversation between Ryan and Fitz Nowlan illuminates a fundamental shift: we can no longer test code by knowing its contents. As LLM-driven agents and MCP servers proliferate, non-determinism becomes a feature, not a bug. Testers must adopt new mindsets—constraint-based verification, contract testing, and an emphasis on data over code. Data locality and data construction offer a robust path forward, turning the focus from predicting outputs to ensuring reliability in unpredictable environments.

This evolution demands that developers, QA engineers, and architects rethink their toolchains and philosophies. The future of testing lies not in understanding every line, but in designing systems and data that thrive amidst uncertainty. By embracing these changes, we can continue to deliver quality software even when we don't know exactly what's inside.