Manually instrument LLM spans
Overview
This topic explains how to manually instrument LLM spans so LaunchDarkly can observe model activity when automatic instrumentation is not available.
Manual instrumentation applies when an application uses a custom LLM integration, an unsupported provider, or an internal abstraction that prevents the use of LaunchDarkly AI SDKs. By recording spans directly, teams can capture model latency, token usage, prompts, completions, and tool calls, and can analyze them in AI Config monitoring and trends views.
LaunchDarkly recommends automatic instrumentation through AI SDKs for most applications. Manual instrumentation supports advanced scenarios where observability is required without changing how models are integrated or invoked.
Set up LaunchDarkly observability plugin
Before you can manually record LLM spans, you must initialize your LaunchDarkly SDK with the observability plugin in your application. This enables your application to emit spans that LaunchDarkly can ingest and display in monitoring and trends views. All examples in the following sections assume this setup is already complete.
For setup instructions for other SDKs and environments, read the Observability SDKs documentation.
Complete the following steps to install and initialize LaunchDarkly observability for a Python application.
-
Install the required LaunchDarkly SDK and observability package.
Install LaunchDarkly SDK and observability package (Python) -
Initialize the LaunchDarkly SDK with the observability plugin enabled. This configuration is typically done once at application startup and enables span ingestion for your service.
Initialize LaunchDarkly with Observability plugin (Python)
Manually instrument LLM spans
Manual instrumentation involves creating a span around an LLM invocation and attaching structured attributes that describe the request and response. SDKs send these spans to LaunchDarkly observability and associate them with AI Config monitoring and trends views.
The examples below use Python. The attribute conventions shown are language agnostic, but the APIs and helpers used in these examples are specific to the Python SDK.
Record an LLM span
Use this pattern to record a single LLM request as a span with basic model metadata, token usage, and one prompt and completion.
Some attributes use the gen_ai namespace, while others use the llm namespace, as supported by LaunchDarkly observability. Token usage attributes include both the gen_ai and llm namespaces.
Token usage values should be read from the LLM provider response. Because token counts depend on provider-specific tokenization and system-injected content, they are not known ahead of time and should not be calculated by the application. Instead, record the values returned by the provider SDK, such as response.usage.prompt_tokens and response.usage.completion_tokens.
Wrap each model call in a span created with observe.start_span. The span lifecycle should align closely with the actual model invocation.
The following example records a single LLM request using this minimal attribute set. In practice, you typically start the span before invoking the model and populate attributes after the response is received so that span duration reflects model latency. Token usage values usually come from the model provider response.
Prompt and completion attributes
Use indexed prompt and completion attributes to record the order of messages sent to and returned from an LLM. Indexes start at 0 and increment for each message to preserve sequence.
Example: single prompt
Use this pattern when your application sends a single message to the model and expects a single response. This example records one user prompt.
Example: multi-turn conversation
Use this pattern for chat-based or multi-turn interactions where message history affects the model response. This example records a system prompt followed by multiple user and assistant messages in order.
Example: completion message
Use this pattern to record a model response associated with a prompt or conversation. Completion messages use the same indexing pattern as prompts.
Record tool and function calls
Use this pattern when a model invokes a tool or function instead of returning a text response. Tool calls are recorded as part of the completion attributes so they can be reviewed alongside other model behavior.
Example: single tool call
Use this example when a model response includes one tool or function invocation. This pattern records the tool name and the arguments provided by the model.
Example: multiple tool calls
Use this example when a single model response invokes more than one tool. Record each invocation by incrementing the tool call index to preserve order.
Record available functions
Use this pattern when your application provides tools or functions to the model as part of the request. Recording available functions captures which actions the model was allowed to choose from at inference time.
Example: record available functions for a request
Use this example when you want to include function metadata as part of an LLM span. This pattern records the names, descriptions, and argument schemas for each function made available to the model.
Recording available functions provides additional context when reviewing model behavior.
Example: complete LLM interaction
The following example shows how to record prompts, completions, token usage, and optional tool calls together using a single helper function.
Use this example as a reference implementation for manually recording LLM spans in your application.
How this data appears in AI Configs
Manually instrumented LLM spans appear in the same monitoring and trends workflows as automatically collected AI Config telemetry.
Recorded spans surface in the AI Config Monitoring view, where teams can review latency, token usage, request volume, and request details. Prompt, completion, and tool call attributes are grouped under each LLM request to support debugging and investigation.
Span data also contributes to the AI Config trends explorer. This allows teams to compare model behavior across variations, environments, and time ranges using a single observability schema.
No additional configuration is required in AI Configs. As long as spans use the supported attribute conventions, manually instrumented data integrates with existing monitoring, trends, and evaluation workflows.