Manually instrument LLM spans

Overview

This topic explains how to manually instrument LLM spans so LaunchDarkly can observe model activity when automatic instrumentation is not available.

Manual instrumentation applies when an application uses a custom LLM integration, an unsupported provider, or an internal abstraction that prevents the use of LaunchDarkly AI SDKs. By recording spans directly, teams can capture model latency, token usage, prompts, completions, and tool calls, and can analyze them in AI Config monitoring and trends views.

LaunchDarkly recommends automatic instrumentation through AI SDKs for most applications. Manual instrumentation supports advanced scenarios where observability is required without changing how models are integrated or invoked.

Set up LaunchDarkly observability plugin

Before you can manually record LLM spans, you must initialize your LaunchDarkly SDK with the observability plugin in your application. This enables your application to emit spans that LaunchDarkly can ingest and display in monitoring and trends views. All examples in the following sections assume this setup is already complete.

For setup instructions for other SDKs and environments, read the Observability SDKs documentation.

Complete the following steps to install and initialize LaunchDarkly observability for a Python application.

Install the required LaunchDarkly SDK and observability package.

Install LaunchDarkly SDK and observability package (Python)

$ pip install launchdarkly-observability launchdarkly-server-sdk

Initialize the LaunchDarkly SDK with the observability plugin enabled. This configuration is typically done once at application startup and enables span ingestion for your service.

Initialize LaunchDarkly with Observability plugin (Python)

1 import ldclient
2 from ldclient.config import Config
3 from ldobserve import ObservabilityConfig, ObservabilityPlugin
4 
5 plugin = ObservabilityPlugin(
6     ObservabilityConfig(
7         service_name="my-llm-app",
8         service_version="1.0.0",
9     )
10 )
11 
12 ldclient.set_config(
13     Config(
14         sdk_key="your-sdk-key",
15         plugins=[plugin],
16     )
17 )

Manual instrumentation involves creating a span around an LLM invocation and attaching structured attributes that describe the request and response. SDKs send these spans to LaunchDarkly observability and associate them with AI Config monitoring and trends views.

The examples below use Python. The attribute conventions shown are language agnostic, but the APIs and helpers used in these examples are specific to the Python SDK.

Record an LLM span

Use this pattern to record a single LLM request as a span with basic model metadata, token usage, and one prompt and completion.

Some attributes use the gen_ai namespace, while others use the llm namespace, as supported by LaunchDarkly observability. Token usage attributes include both the gen_ai and llm namespaces.

Token usage values should be read from the LLM provider response. Because token counts depend on provider-specific tokenization and system-injected content, they are not known ahead of time and should not be calculated by the application. Instead, record the values returned by the provider SDK, such as response.usage.prompt_tokens and response.usage.completion_tokens.

Wrap each model call in a span created with observe.start_span. The span lifecycle should align closely with the actual model invocation.

The following example records a single LLM request using this minimal attribute set. In practice, you typically start the span before invoking the model and populate attributes after the response is received so that span duration reflects model latency. Token usage values usually come from the model provider response.

Record a single LLM span (Python)

1 from ldobserve import observe
2 
3 with observe.start_span("llm.chat") as span:
4     # Call your LLM provider here
5     response = call_llm()
6 
7     # Populate attributes after the response is available
8     span.set_attributes({
9         "gen_ai.system": "OpenAI",
10         "gen_ai.request.model": request_model,
11         "gen_ai.response.model": response.model,
12 
13         # Token usage values come from the provider response
14         "gen_ai.usage.prompt_tokens": response.usage.prompt_tokens,
15         "gen_ai.usage.completion_tokens": response.usage.completion_tokens,
16         "gen_ai.usage.total_tokens": response.usage.prompt_tokens + response.usage.completion_tokens,
17 
18         "gen_ai.prompt.0.role": "user",
19         "gen_ai.prompt.0.content": "What is the capital of France?",
20 
21         "gen_ai.completion.0.role": "assistant",
22         "gen_ai.completion.0.content": response.text,
23     })

Prompt and completion attributes

Use indexed prompt and completion attributes to record the order of messages sent to and returned from an LLM. Indexes start at 0 and increment for each message to preserve sequence.

Example: single prompt

Use this pattern when your application sends a single message to the model and expects a single response. This example records one user prompt.

Single prompt attributes example

{
    "gen_ai.prompt.0.role": "user",
    "gen_ai.prompt.0.content": "What is the capital of France?",
}

Example: multi-turn conversation

Use this pattern for chat-based or multi-turn interactions where message history affects the model response. This example records a system prompt followed by multiple user and assistant messages in order.

Multi-turn conversation attributes example

{
    "gen_ai.prompt.0.role": "system",
    "gen_ai.prompt.0.content": "You are a helpful travel assistant.",
    "gen_ai.prompt.1.role": "user",
    "gen_ai.prompt.1.content": "I want to visit France.",
    "gen_ai.prompt.2.role": "assistant",
    "gen_ai.prompt.2.content": "France is a popular destination.",
    "gen_ai.prompt.3.role": "user",
    "gen_ai.prompt.3.content": "What is the capital?",
}

Example: completion message

Use this pattern to record a model response associated with a prompt or conversation. Completion messages use the same indexing pattern as prompts.

Completion message attributes example

{
    "gen_ai.completion.0.role": "assistant",
    "gen_ai.completion.0.content": "The capital of France is Paris.",
}

Record tool and function calls

Use this pattern when a model invokes a tool or function instead of returning a text response. Tool calls are recorded as part of the completion attributes so they can be reviewed alongside other model behavior.

Example: single tool call

Use this example when a model response includes one tool or function invocation. This pattern records the tool name and the arguments provided by the model.

Single tool call attributes example

{
    "gen_ai.completion.0.role": "assistant",
    "gen_ai.completion.0.content": "",
    "gen_ai.completion.0.tool_calls.0.name": "get_weather",
    "gen_ai.completion.0.tool_calls.0.arguments": "{\"city\": \"Paris\"}",
}

Example: multiple tool calls

Use this example when a single model response invokes more than one tool. Record each invocation by incrementing the tool call index to preserve order.

Multiple tool calls attributes example

{
    "gen_ai.completion.0.tool_calls.0.name": "get_capital",
    "gen_ai.completion.0.tool_calls.0.arguments": "{\"country\": \"France\"}",
    "gen_ai.completion.0.tool_calls.1.name": "get_weather",
    "gen_ai.completion.0.tool_calls.1.arguments": "{\"city\": \"Paris\"}",
}

Record available functions

Use this pattern when your application provides tools or functions to the model as part of the request. Recording available functions captures which actions the model was allowed to choose from at inference time.

Example: record available functions for a request

Use this example when you want to include function metadata as part of an LLM span. This pattern records the names, descriptions, and argument schemas for each function made available to the model.

Available functions attributes example

{
    "llm.request.functions.0.name": "get_capital",
    "llm.request.functions.0.description": "Retrieves the capital city for a given country.",
    "llm.request.functions.0.arguments": "{\"country\": {\"type\": \"string\"}}",
    "llm.request.functions.1.name": "get_weather",
    "llm.request.functions.1.description": "Gets the current weather for a city.",
    "llm.request.functions.1.arguments": "{\"city\": {\"type\": \"string\"}}",
}

Recording available functions provides additional context when reviewing model behavior.

Example: complete LLM interaction

The following example shows how to record prompts, completions, token usage, and optional tool calls together using a single helper function.

Use this example as a reference implementation for manually recording LLM spans in your application.

Complete LLM interaction helper example (Python)

1 from ldobserve import observe
2 
3 def record_llm_interaction(prompt, response, model, tokens, tool_calls=None):
4     attributes = {
5         "gen_ai.system": "OpenAI",
6         "gen_ai.request.model": model,
7         "gen_ai.response.model": model,
8 
9         "gen_ai.usage.prompt_tokens": tokens["prompt"],
10         "gen_ai.usage.completion_tokens": tokens["completion"],
11         "gen_ai.usage.total_tokens": tokens["prompt"] + tokens["completion"],
12 
13         "gen_ai.prompt.0.role": "user",
14         "gen_ai.prompt.0.content": prompt,
15 
16         "gen_ai.completion.0.role": "assistant",
17         "gen_ai.completion.0.content": response,
18     }
19 
20     if tool_calls:
21         for i, tc in enumerate(tool_calls):
22             attributes[f"gen_ai.completion.0.tool_calls.{i}.name"] = tc["name"]
23             attributes[f"gen_ai.completion.0.tool_calls.{i}.arguments"] = tc["arguments"]
24 
25     with observe.start_span("llm.chat", attributes=attributes):
26         pass

How this data appears in AI Configs

Manually instrumented LLM spans appear in the same monitoring and trends workflows as automatically collected AI Config telemetry.

Recorded spans surface in the AI Config Monitoring view, where teams can review latency, token usage, request volume, and request details. Prompt, completion, and tool call attributes are grouped under each LLM request to support debugging and investigation.

Span data also contributes to AI insights. This allows teams to compare model behavior across variations, environments, and time ranges using a single observability schema.

No additional configuration is required in AI Configs. As long as spans use the supported attribute conventions, manually instrumented data integrates with existing monitoring, trends, and evaluation workflows.