The human-in-the-loop alternative to autonomous remediation
The human-in-the-loop alternative to autonomous remediation
The human-in-the-loop alternative to autonomous remediation
Published May 29, 2026
From triggered alert to root cause in five prompts with Claude Code and the LaunchDarkly observability MCP server
In the previous tutorial, Catch Silent AI Failures using Vega, we saw how alerts can trigger autonomous remediation using telemetry data to pinpoint the root cause. In a single intentional swoop, Vega detects, diagnoses, and rolls back changes to protect the user experience. This works great for most use cases, but if you want the option to investigate incidents yourself with the advantages of an AI assistant, then this tutorial is for you.
Here, you’ll triage a real alert in Claude Code, correlate error groups, traces, and flag evaluations to confirm the root cause, then post the findings to your on-call Slack channel.
To get started, you’ll need:

To follow along with this tutorial, clone the Serenia repo, an AI assistant for an event venue business. It has three skills:
answer_faqlog_inquiryqualify_leadEvery inquiry is classified and sent to an Airtable as the CRM backend.

You’ll want a way to test how different variations impact traffic, but first you’ll need to create an AgentControl config for the qualify_lead skill to determine if the request is hot, warm, or cold.
You’ll also need to create two variations. The first uses the original JSON shape:
The second variation introduces a breaking JSON shape that adds three new variables not recognized by the parser contract:
LaunchDarkly’s AI configs product was renamed to AgentControl this week. The MCP server endpoints, slash commands, and some skill names still use the aiconfigs slug, and the resource itself is still commonly called an “AI config.” This tutorial uses the current product name (AgentControl) and keeps the legacy slugs in code and commands where they still apply.
Use the prompt below in Claude Code with the LaunchDarkly MCP server connected. It scaffolds the project, the AgentControl config, and both variations in a single pass.
Result:
You can now create a percentage rollout in Agents → Configs. Under the AgentControl config you created, click Targeting with v1 set to 100% and v2 to 0% of traffic.

In the LaunchDarkly UI under Telemetry → Alerts, create the alert that will trigger an error at a given threshold.
Auto remediation will be set to off. If on, Vega would automatically investigate the issue, but for this tutorial you want to be able to investigate the issue yourself.
Then, generate baseline traffic to make sure observability data is coming in.
Traces View:

Metrics View:

Now, let’s test the breaking change by shifting variation percentages to v1 → 75% and v2 → 25%.

This will shift 25% of traffic to variation 2, which injects a new JSON shape into the system prompt.
Send requests so the new v2 variation gets exercised:
If you navigate to Telemetry → Errors, you’ll be able to see the error picked up by the alert you created.

This is the step where you could use Vega to handle investigation, but sometimes you may want to drill down yourself and use AI to get to the root cause sooner. For this scenario, you can use these five prompts in Claude Code to diagnose the issue now that the alert has been triggered. You can also run this same investigation in your own AI agent by connecting the LaunchDarkly MCP server — the prompts below work in any agent with the MCP connected.
Result:
This first prompt does what a human would do clicking through a dashboard, except faster and with the synthesis already written. It establishes when the spike began, how many sessions and users are hit, which skill is responsible, and what the error fingerprint looks like.
Result:
With this output, you’re able to put yourself in the user’s shoes and see exactly what feature might have caused the issue. In this case the error is occurring at the lead_score stage of the qualify_lead agent skill.
Result:
Result:
This is the prompt that turns a hypothesis into a diagnosis. The previous prompt established correlation in time; this one establishes correlation in cohort. Together they’re enough to make a confident decision.
Result:
The fifth prompt closes the loop. You started with an alert that said “error count is up.” You end with a one-sentence root cause and two specific remediation paths, with the agent volunteering an opinion on which is faster. This is where the human in the loop is needed just in time to act on the suggested root cause and remediation steps.
Now that you have the diagnosis, there are three plausible paths forward, and each one is a real option an on-call engineer might pick.
qualify-lead-v2-precise to 0% and serve everyone qualify-lead-v1-stable while the parser is fixed. Fastest, blunt, and stops the bleeding immediately. Downside: any benefit from the new prompt’s richer scoring (urgency, budget signal, decision authority) is paused until the parser catches up.lead_temperature. The error count stays elevated for ~25% of leads until the patch ships, but you don’t yo-yo the variation in a way that complicates the audit trail.qualify_lead.py in another PR, this might be cheaper than it looks.The MCP server gave you the evidence. You give the order. To turn the diagnosis into communication, run one more prompt: