Run continuous experiments in production
This guide shows how to use LaunchDarkly to experiment at the scale of AI, which can generate more variations than you can manually validate. With LaunchDarkly, you define variations, measure real-world impact against production traffic, and promote the winning variations to give them larger reach. This loop runs continuously, without redeploying.
Prerequisites
To complete this guide, you need the following:
- A LaunchDarkly account.
- LaunchDarkly installed and initialized in your application. To learn more, read SDK overview.
- A feature or agent behavior you want to optimize.
Define success metrics before the experiment runs
Before you run an experiment, take the time to determine which metrics will indicate success. Changing metrics mid-experiment will invalidate the results.
How continuous experimentation works
The pattern is the same for code and for AgentControl configs: create variations, expose them to real users, measure what wins, promote it, and repeat.
This table shows where each product applies:
The loop is the same in both cases. The tooling differs slightly.
Step 1: Create variations and define success metrics
For code with CodeControl, create a multivariate feature flag with a variation for each option you want to test. Define the metrics that constitute a win, such as conversion rate, error rate, engagement, latency, or any business metric you can measure.
To learn more, read Creating new flags
Here is an example of evaluating a multivariate flag:
To experiment with AI features with AgentControl, create an AgentControl config with a variation for each prompt, model, or parameter combination you want to test. Set success metrics that reflect real agent performance, such as task completion rate, output quality scores, latency, or cost per call.
To learn more, read AgentControl
Here is an example of retrieving an AgentControl config:
Step 2: Expose variations to production traffic
Use LaunchDarkly’s Experimentation feature to split traffic across your variations. LaunchDarkly handles assigning traffic to variations, so each user or context consistently receives the same variation. The experiment tracks results for each variation against your defined metrics.
Start with enough traffic to reach statistical significance in a reasonable timeframe. If you want to run many experiments at the same time, prioritize the ones tied to your highest-impact metrics.
Run experiments on real production traffic
Run experiments on real production traffic, not synthetic or internal traffic. Behavior in a staging environment rarely matches what real users do.
Step 3: Measure and promote the winner
As results accumulate, performance information appears for each variation. When a variation’s results reach reach statistical significance, do the following:
- Promote the winner by updating the flag or AgentControl config to serve the winning variation to 100% of traffic.
- Confirm the change takes effect immediately. No redeploy is required.
- Archive the losing variations to keep your flag and config inventory clean.
For AgentControl configs, promotion updates the active model or prompt configuration globally without a code change.
Step 4: Keep the loop running
Promotion isn’t the end of the experiment. It’s the start of the next one. After you promote a winner:
- Generate new variations against the new baseline established by your previous results.
- Run the next experiment on the highest-impact question, which may have changed based on previous experiment results.
- Let production data, not assumptions, drive each decision.
The goal is a system where every meaningful change starts as a variation in an experiment, generates signal, and you either promote it or discard it based on real data.
Next steps
To continue, explore the following topics:
- Experimentation to configure traffic splits, metrics, and statistical analysis.
- AgentControl to manage prompt and model variations for agent experiments.
- Metrics to measure business and system impact for each variation.
- Guarded releases to add safety guardrails to experiments in high-risk areas.