This topic explains the metrics LaunchDarkly automatically generates from SDK events and how you can use them to monitor the health of your applications.
Metric events
An “event” happens when someone takes an action in your app, such as clicking on a button, or when a system takes an action, such as loading a page. Your SDKs send these metric events to LaunchDarkly, where, for certain event kinds, LaunchDarkly can automatically create metrics from those events. You can use these metrics with experiments and guarded rollouts to track how your flag changes affect your customers’ behavior.
Autogenerated metrics are marked on the Metrics list with an autogeneratedtag. You can view the events that autogenerated these metrics from the Metrics list by clicking View, then Events.
Randomization units for autogenerated metrics
LaunchDarkly sets the randomization unit for autogenerated metrics to your account’s default context kind for experiments. For most accounts, the default context kind for experiments is user. However, you may have updated your default context kind to account, device, or some other context kind you use in experiments most often. To learn how to change the default context kind for experiments, read Map randomization units to context kinds.
All autogenerated metrics are designed to work with a randomization unit of either user or request. Depending on your account’s default context kind for experiments, you may need to manually update the randomization unit for autogenerated metrics as needed. The recommended randomization units for each autogenerated metric are listed in the tables below. To learn how to manually update the randomization unit for a metric, read Edit metrics.
Metrics autogenerated from AI SDK events
An AI config is a resource that you create in LaunchDarkly and then use to customize, test, and roll out new large language models (LLMs) within your generative AI applications. As soon as you start using AI configs in your application, you can track how your AI model generation is performing, and your AI SDKs begin sending events to LaunchDarkly.
AI SDK events are prefixed with $ld:ai and LaunchDarkly automatically generates metrics from these events.
Some events generate multiple metrics that measure different aspects of the same event. For example, the $ld:ai:feedback:user:positive event generates a metric that measures the average number of positive feedback events per user, and a metric that measures the percentage of users that generated positive feedback.
This table explains the metrics that LaunchDarkly autogenerates from AI SDK events:
Measurement method: Count Unit aggregation method: Sum Analysis method: Average Success criterion: Higher is better Units without events: Include units that did not send any events and set their value to 0
User
Name: Positive AI feedback count Description: Average number of positive feedback events per context Example usage: Running an experiment to find out which variation causes more users to click “thumbs up”
Measurement method: Occurrence Unit aggregation method: Average Analysis method: Average Success criterion: Higher is better Units without events: Include units that did not send any events and set their value to 0
Request
Name: Positive AI feedback rate Description: Percentage of contexts that generated positive AI feedback Example usage: Running a guarded rollout to make sure there is a positive feedback ratio throughout the rollout
Measurement method: Count Unit aggregation method: Sum Analysis method: Average Success criterion: Lower is better Units without events: Include units that did not send any events and set their value to 0
User
Name: Negative AI feedback count Description: Average number of negative feedback events per context Example usage: Running an experiment to find out which variation causes more users to click “thumbs down”
Measurement method: Occurrence Unit aggregation method: Average Analysis method: Average Success criterion: Lower is better Units without events: Include units that did not send any events and set their value to 0
User
Name: Negative AI feedback rate Description: Percentage of contexts that generated negative AI feedback Example usage: Running an experiment to find out which variation causes more users to click “thumbs down”
Measurement method: Value/size Unit aggregation method: Average Analysis method: Average Success criterion: Lower is better Units without events: Exclude units that did not send any events
Request
Name: Average input tokens per AI completion Description: For example, for a chatbot, this might indicate user engagement Example usage: Running an experiment to find out which variation results in fewer input tokens, reducing cost
Measurement method: Value/size Unit aggregation method: Average Analysis method: Average Success criterion: Lower is better Units without events: Exclude units that did not send any events
Request
Name: Average output tokens per AI completion Description: Indicator of cost, when charged by token usage Example usage: Running an experiment to find out which variation results in fewer output tokens, reducing cost
Measurement method: Value/size Unit aggregation method: Average Analysis method: Average Success criterion: Lower is better Units without events: Exclude units that did not send any events
Request
Name: Average total tokens per AI completion Description: Indicator of cost, when charged by token usage Example usage: Running an experiment to find out which variation results in fewer total tokens, reducing cost
Measurement method: Value/size Unit aggregation method: Average Analysis method: Average Success criterion: Lower is better Units without events: Exclude units that did not send any events
Request
Name: Average AI completion time Description: Time required for LLM to finish a completion Example usage: Running an experiment to find out which variation results in faster user completion, improving engagement
Measurement method: Count Unit aggregation method: Sum Analysis method: Average Success criterion: Higher is better Units without events: Include units that did not send any events and set their value to 0
User
Name: AI completion success count Description: Counter for successful LLM completion requests Example usage: Running an experiment to find out which variation results in more user completion requests (“chattiness”), improving engagement
Measurement method: Occurrence Unit aggregation method: Average Analysis method: Average Success criterion: Lower is better Units without events: Include units that did not send any events and set their value to 0
Request
Name: AI completion error count Description: Counter for erroneous LLM completion requests Example usage: Running a guarded rollout to make sure the change doesn’t result in a higher error rate
Measurement method: Occurrence Unit aggregation method: Average Analysis method: Average Success criterion: Lower is better Units without events: Include units that did not send any events and set their value to 0
User
Name: AI completion error count Description: Counter for erroneous LLM completion requests Example usage: Running a guarded rollout to make sure the change doesn’t result in a higher error rate
Measurement method: Count Unit aggregation method: Sum Analysis method: Average Success criterion: Lower is better Units without events: Include units that did not send any events and set their value to 0
User
Name: AI completion error count Description: Counter for erroneous LLM completion requests Example usage: Running a guarded rollout to make sure the change doesn’t result in a higher number of errors
Metric kind: Numeric Event key: $ld:ai:tokens:ttf
Measurement method: Value/size Unit aggregation method: Average Analysis method: Average Success criterion: Lower is better Units without events: Exclude units that did not send any events
User
Name: Average time to first token for AI requests Description: Time required for LLM to generate first token Example usage: Running a guarded rollout to make sure the change doesn’t result in longer token generation times
Example: Average number of positive feedback ratings per user
The autogenerated metric in the first row of the above table tracks the average number of positive feedback ratings per user.
Here is what the metric setup looks like in the LaunchDarkly user interface:
An autogenerated metric.
Metrics autogenerated from telemetry integration events
The LaunchDarkly telemetry integrations provide error monitoring and metric collection. Each telemetry integration is a separate package, which you install in addition to the LaunchDarkly SDK. After you initialize the telemetry integration, you register the LaunchDarkly SDK client with the telemetry instance. The instance collects and sends telemetry data to LaunchDarkly, where you can review metrics, events, and errors from your application.
Telemetry integration events are prefixed with $ld:telemetry and LaunchDarkly automatically generates metrics from these events.
This table explains the metrics that LaunchDarkly autogenerates from events recorded by the telemetry integration for LaunchDarkly browser SDKs:
Measurement method: Occurrence Unit aggregation method: Average Analysis method: Average Success criterion: Lower is better Units without events: Include units that did not send any events and set their value to 0
User
Name: Percentage of users with errors (LaunchDarkly) Description: Measures the percentage of users that encountered an error at least once, as reported by the LaunchDarkly Telemetry SDK. Useful when running a guarded rollout. Example usage: Running a guarded rollout to make sure the change doesn’t result in a higher error rate
Metrics autogenerated from server-side SDKs using OpenTelemetry
LaunchDarkly’s SDKs support instrumentation for OpenTelemetry traces. Traces provide an overview of how your application handles requests. For example, traces may show that a particular feature flag was evaluated for a particular context as part of a given HTTP request. To learn more, read OpenTelemetry and Sending OpenTelemetry traces to LaunchDarkly.
OpenTelemetry events are prefixed with otel and LaunchDarkly automatically generates metrics from these events.
This table explains the metrics that LaunchDarkly autogenerates from OpenTelemetry traces:
Measurement method: Occurrence Unit aggregation method: Average Analysis method: Average Success criterion: Lower is better Units without events: Include units that did not send any events and set their value to 0
User
Name: User HTTP error rate (OpenTelemetry) Description: Measures the percentage of users that encountered an error inside HTTP spans at least once, as reported by OpenTelemetry. Useful when running a guarded rollout.
Measurement method: Occurrence Unit aggregation method: Average Analysis method: Average Success criterion: Lower is better Units without events: Include units that did not send any events and set their value to 0
User
Name: User HTTP 5XX response rate (OpenTelemetry) Description: Measures the percentage of users that encountered an HTTP 5XX response at least once, as reported by OpenTelemetry. Useful when running a guarded rollout.
Measurement method: Occurrence Unit aggregation method: Average Analysis method: Average Success criterion: Lower is better Units without events: Include units that did not send any events and set their value to 0
User
Name: User non-HTTP exception rate (OpenTelemetry) Description: Measures the percentage of users that encountered an exception outside of HTTP spans at least once, as reported by OpenTelemetry. Useful when running a guarded rollout.
Measurement method: Value/size Unit aggregation method: Average Analysis method: Average Success criterion: Lower is better Units without events: Exclude units that did not send any events
Request
Name: Average request latency (OpenTelemetry) Description: Measures the average request latency, as reported by OpenTelemetry. Useful when running a guarded rollout. For best results, use a ‘request’ randomization unit and send ‘request’ contexts.
Measurement method: Value/size Unit aggregation method: Average Analysis method: P95 Success criterion: Lower is better Units without events: Exclude units that did not send any events
Request
Name: P95 request latency (OpenTelemetry) Description: Measures the 95th percentile request latency, as reported by OpenTelemetry. For many applications, this represents the experience for most requests. You can adjust the percentile to fit your application’s needs. Useful when running a guarded rollout. For best results, use a ‘request’ randomization unit and send ‘request’ contexts.
Measurement method: Value/size Unit aggregation method: Average Analysis method: P99 Success criterion: Lower is better Units without events: Exclude units that did not send any events
Request
Name: P99 request latency (OpenTelemetry) Description: Measures the 99th percentile request latency, as reported by OpenTelemetry. For many applications, this represents the worst-case experiences. You can adjust the percentile to fit your application’s needs. Useful when running a guarded rollout. For best results, use a ‘request’ randomization unit and send ‘request’ contexts.