Components of a metric

Overview

This topic describes the key components of a LaunchDarkly metric. This is to help you understand how to create metrics for use with experiments, guarded rollouts, and release policies.

What is a LaunchDarkly Metric?

A LaunchDarkly metric consists of three distinct components:

  • Event: A metric event occurs when a user takes an action in your application, or when the application itself takes action. SDKs generate metric events for a context and send them to LaunchDarkly, where you can use the events to create metrics. To learn more, read Metric events.

  • Aggregation method: During an experiment or guarded rollout, LaunchDarkly collects metrics events for contexts that participate in the experiment or rollout. The metric aggregation method specifies how LaunchDarkly aggregates multiple events that are collected for a given context. To learn more, read Aggregation method.

  • Analysis method: During an experiment or rollout, LaunchDarkly associates metrics with the flag variation that was served to each context. The analysis method specifies the context kind to use for collecting metrics, and determines how LaunchDarkly aggregates the collected metric data for all contexts that receive a given flag variation. To learn more, read Analysis method.

This screenshot shows where you configure the different components of a typical metric in the LaunchDarkly user interface (UI):

Configuring metric components.

Configuring metric components.

Metric events

Metric events can track discrete user actions, such as button clicks, page views, or completed transactions. Events that track actions are called conversion events, and are the most common metric events used in experiments. For example, you might create an experiment using a conversion metric to determine which flag variation results in more completed transactions, or more visits to a sign-up page.

Metric events can optionally produce a numerical value to track things like the duration of a transaction or the total amount of a purchase. Events that produce a value are called numeric metrics, and are typically used in guarded rollouts to measure latency or regression performance. For example, you might use a numeric metric with a guarded rollout to ensure that you only release a particular flag variation if it does not negatively impact performance for customers.

When you create a new metric in the LaunchDarkly UI, you begin by selecting one of the following metric event types:

  • Page viewed: These conversion events are generated when a customer visits a specific page in your application. You instrument page viewed events directly in the LaunchDarkly user interface when you create a new Page viewed metric. Page viewed events and metrics require that you use a supported client SDK. To learn more, read Page viewed conversion metrics.

  • Clicked or tapped: These conversion events are generated by customer activities such as clicking an application button or selecting an option from a page. Similar to page viewed events, you instrument clicked or tapped events directly in the LaunchDarkly UI when you create the new metric. You must use a supported client SDK to instrument clicked or tapped metrics events. To learn more, read Clicked or tapped conversion metrics.

  • Custom: Custom events can record either customer activities (conversion events) or system values (numeric metrics). For example, you can use custom events to track page views or click or tap actions with a server-side SDK. You can also instrument custom events to record the total time it takes to complete a transaction to measure regressions, or the total value of a completed transaction.

    You must instrument custom events directly in your application code using the LaunchDarkly SDK track() method. If your application has already generated custom events you can select the event key when you create a new custom metric. Optionally, you can specify a new event key to use when you create a custom metric; if you do this, ensure that you later implement the same key in your application with track() so that LaunchDarkly collects the configured metric. You can instrument custom events to provide optional metadata, which you can use locate and filter the specific events you want to include in a metric.

    All LaunchDarkly SDKs support creating custom events.

To learn more, read Events.

Aggregation method

The metric aggregation method specifies how LaunchDarkly aggregates multiple events that are observed for a given context during the course of an attached experiment or guarded rollout. You can aggregate events per context using either:

  • the sum of collected event values
  • the average of collected values

Configuring aggregation method for numeric metric.

Configuring aggregation method for numeric metric.
Funnel metric groups can only use metrics aggregated by average

Funnel metric groups can only include metrics that use the average aggregation method.

Aggregation for conversion metrics

With conversion events, such as as page views or button clicks, LaunchDarkly assigns a value of 1 to indicate that an event was measured for a context during an experiment or rollout. Contexts that do not produce an event are assigned a value of 0. For example, consider a conversion event that tracks completed purchases in an online store. During an experiment, LaunchDarkly measures the completion event for two of the three contexts that receive a specific flag variation:

ContextEvents measuredAggregate by sum (count metric)Aggregate by average (occurrence/binary metric)
User A3 purchase events31
User Bno purchase events00
User C2 purchase events21

Aggregating conversion events by sum measures the count of events recorded for a given context during the course of an experiment or rollout. This type of metric is referred to as a count metric when creating a new metric or viewing labels in the metrics list. A count metric for the example events above uses sum aggregation to yield the values of 3, 0, and 2 for user purchases during the experiment.

Aggregating conversion events by average results in a metric that measures the occurrence of an event for a context during an experiment or rollout. This type of metric is referred to as an occurrence or binary metric when creating a new metric or viewing labels in the metrics list, because it results in either a 0 or 1 value to indicate whether the event occurred for a context. Creating an occurrence metric with the example measurements above uses average aggregation to yield the values of 3 / 3 = 1, 0, and 2 /2 = 1. With this metric a 1 indicates the user that made at least one purchase during the experiment.

Aggregation for numeric metrics

Numeric metrics also use the sum or average aggregation method to aggregate multiple event values for a given context. However, they use the actual numeric values collected from the metric events, instead of assigning 1 and 0 values to track actions as with conversion metrics. For example, consider a custom numeric event that tracks completed purchases in an online store and provides the total cart amount value with each event. During an experiment, LaunchDarkly measures the event for two of the three contexts that receive a specific flag variation:

ContextEvents measuredValuesAggregate by sum (numeric metric)Aggregate by average (numeric metric)
User A3 purchase events87.77, 29.20, 33.13150.1050.03
User Bno purchase eventsn/an/an/a
User C1 purchase event73.5573.5573.55

Aggregating numeric events by sum measures the total of all event values recorded for a context during the course of an experiment or rollout. Aggregating by average measures the arithmetic mean of collected values per context.

User B in the above example has no aggregated values, because no events (and therefore no values) were recorded during the course of the experiment. For numeric metrics, the metric analysis method determines how to treat contexts in an experiment or rollout that did not generate metric events.

Analysis method

The analysis method determines how LaunchDarkly aggregates the collected metric values for all contexts that receive a particular flag variation during an experiment or rollout. You can analyze metric values using either the average (arithmetic mean) of collected values, or by using percentiles. To learn more, read Average analysis and Percentile analysis.

The analysis method also specifies which context kind to use for measuring metric events during an experiment or rollout.

For events that produce a numerical value, the analysis method also determines how an experiment or rollout should handle contexts that did not generate metric events. Finally, the analysis method specifies higher or lower metric values indicate success for the flag variation during an experiment or rollout. To learn more, read Units without events and Success criteria.

Configuring analysis method for numeric metric.

Configuring analysis method for numeric metric.

Average analysis

The average analysis method totals all of the aggregated metric values for all contexts that received a particular flag variation, and divides by the number of participating contexts. “Average” is the default analysis method.

Occurrence or binary metrics always use the average analysis method, which computes the percentage of contexts that produced the conversion event.

For conversion count metrics, average analysis computes the average or arithmetic mean number of conversion events per context.

For value metrics, average analysis also computes the average or arithmetic mean number of metric event values per participating context. You can choose to include contexts that did not generate a metric event during the experiment or rollout, but set their value to zero. To learn more, read Units without events.

Funnel metric groups require average analysis

Funnel metric groups can only include metrics that use the average analysis method.

Percentile analysis

The percentile analysis method computes the metric value measured by a selected percentage of contexts that receive a flag variation. When you choose percentile analysis, you can select from the following options:

  • P50: the 50th percentile, which is also the median, is when half of the context measurements are above and half are below this number.
  • P75: the 75th percentile is when 75% of the measurements are below this number and 25% are above this number.
  • P90: the 90th percentile is when 90% of the measurements are below this number and 10% are above this number.
  • P95: the 95th percentile is when 95% of the measurements are below this number and 5% are above this number.
  • P99: the 99th percentile is when 99% of the measurements are below this number and 1% are above this number.

The closer the percentile is to 50, the closer it is to the median. Percentiles closer to the median, such as P75, are useful for analyzing general trends. For example, you might want to use the P50 or P75 methods for analyzing things like API latency.

Higher percentiles are better for detecting outliers. For example, if you have an endpoint that generally works well, but handles a single customer with large amounts of data poorly, you might want to use the P95 or P99 methods for the related metric.

Percentile analysis methods for Experimentation are in beta

The default metric analysis method is “average.” The use of percentile analysis methods with LaunchDarkly experiments is in beta. If you use a metric with a percentile analysis method in an experiment with a large audience, the experiment results tab may take longer to load, or the results tab may time out and display an error message. Percentile analysis methods are also not compatible with CUPED adjustments.

Context kind

When you create a metric, you must select a context kind to use for measuring events. The context kind can represent entities such as a user, device, or request, that the metric can measure events from.

Metric context kinds must match experiment or rollout randomization units

The context kind that you choose for the metric analysis must match the randomization unit chosen for connected experiments or rollouts.

Some examples of common metrics and their randomization units include “sign-ups by user,” “clicks by user,” “purchase amount by user,” and “latency by request”:

Metric typeEvent being measuredContext kind
Custom conversion binary

Sign-ups

user
Clicked or tapped conversion

Clicks on a button

user
Custom numeric

Purchase amount

user
Custom numeric

Length of time for a server to respond

request

Success criteria

For metrics that use a custom event, you must indicate whether higher or lower metric values indicate success for a connected experiment or rollout:

  • Higher is better: choose this option for metrics that measure positive things like cart checkouts or sign-ups.
  • Lower is better: choose this option for metrics that measure negative things like errors or latency.

LaunchDarkly automatically sets the success criterion to “higher is better” for clicked or tapped metrics and page viewed metrics.

Units without events

Some contexts that receive a flag variation during an experiment or rollout will not generate the selected metric event. For metrics that measure conversion events, LaunchDarkly can assign the value of zero to contexts that do not generate an event. The zero value in this case accurately indicates no conversion event, and can be used when analyzing conversion count or binary metrics.

With numeric metrics that use average analysis, assigning a zero value to contexts that produce no events is sometimes not desirable. For example, if a metric event records a user satisfaction score, including zero values for users that did not submit a review would negatively skew the average score.

For this reason, LaunchDarkly allows you to choose whether to include units for numeric metrics that analyze by average. You can choose to either:

  • Exclude units that did not send any events from the analysis: this option is best for latency metrics. If LaunchDarkly never receives an event for a context instance, you do not want to default to zero because LaunchDarkly would interpret this as an extremely fast latency time, which would skew or invalidate the results.
  • Include units that did not send any events and set their value to 0: this option is best for metrics where an incomplete process can be treated the same as zero, such as tracking cart totals for an online store. In this example, customers who put items in their cart but never completed the checkout process are treated as if they purchased $0.