Build a production LLM data extraction pipeline with LaunchDarkly AI Configs and Vercel AI Gateway

Published January 9th, 2025

Portrait of Scarlett Attensil.

by Scarlett Attensil

Every conversation contains signals your ML models need. Customer calls reveal buying intent. Support tickets expose product friction. Interview transcripts capture technical depth. The problem? Those signals are buried in thousands of words of unstructured text.

This tutorial shows you how to build a data extraction pipeline that turns messy transcripts into structured JSON - using LaunchDarkly AI Configs to control everything (models, prompts, schemas) without redeploying.

What you’ll build: A pipeline that extracts 40-60+ structured fields from any transcript - sentiment scores, engagement metrics, binary signals, text statistics - all instantly configurable through LaunchDarkly’s UI.

The key insight: When you discover that customer_question_count predicts engagement better than talk time, or that Opus 4.5 handles technical jargon better than GPT-5.2, you can update your extraction logic immediately. No PR, no deploy, no waiting.

Ready to build? Clone the complete example repository to start extracting structured data from your transcripts in minutes.

The problem with unstructured text

Your organization has valuable signals buried in text such as customer conversations, support tickets, interview transcripts, product reviews. Tools like Gong, Chorus, and conversation intelligence platforms are excellent for their designed purpose, but you need something different: extracting specific features for your ML models, with a schema you control completely.

What you typically have:

"Yeah, so we've been looking at different solutions. The other vendor's
pricing was reasonable but their timeline was concerning. We need this
rolled out before Q3..."

What your models need:

1{
2 "alternatives_mentioned": true,
3 "pricing_sentiment": 0.3,
4 "timeline_mentioned": true,
5 "urgency_score": 0.8,
6 "decision_timeframe": "Q3"
7}

How the pieces fit together

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Your Text β”‚ ---> β”‚ Vercel AI Gatewayβ”‚ ---> β”‚ LLM β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ (unified API) β”‚ β”‚ (GPT/Claude)β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
↑ β”‚
β”‚ ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ LaunchDarkly β”‚ β”‚ Structured β”‚
β”‚ AI Config β”‚ β”‚ JSON β”‚
β”‚ (model, prompts, β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ 6 tool schemas) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The AI model automatically selects the most appropriate extraction schema (prospecting, discovery, demo, proposal, technical, or customer success) based on the transcript content.

What stays in LaunchDarkly:

  • Model selection (GPT-5.2, Opus 4.5, Gemini 3)
  • System and user prompts
  • All 6 extraction tool schemas (40-60+ fields each)
  • Temperature and other parameters
  • Intelligent tool selection logic
  • Targeting rules for different use cases

What stays in your code:

  • Reading input files
  • Passing transcript text
  • Writing output CSV/JSON

Why this approach

Change schemas in 2 minutes, not 2 sprints

This separation matters when you discover issues in production. For example, if the AI model is selecting the wrong tool for certain transcript types, you can instantly adjust the prompt or model using LaunchDarkly - takes 2 minutes in the UI instead of a hotfix deploy.

Key benefits:

  • Instant schema updates: Add fields to any of the 6 tools when you discover new predictive signals
  • A/B testing models: Test Google Gemini 3 vs Anthropic Claude Opus 4.5 to see which selects tools more accurately
  • Smart tool selection: AI model automatically chooses between prospecting, discovery, demo, proposal, technical, or customer success schemas
  • Privacy compliance: Different configurations for different regions using targeting rules - GDPR-compliant schemas for European customers

One gateway, all the models

Vercel brings specific advantages for data extraction workloads:

  • Server-sent events for real-time progress: Processing 1,000 transcripts? Stream progress updates to your UI as each completes using the Vercel AI SDK
  • Automatic scaling: From 1 transcript to 10,000 - Vercel functions scale without configuration
  • Built-in reliability: Automatic retries and failover when LLM providers have issues
  • OIDC tokens on deployment: No API keys in production - Vercel handles auth automatically
  • Unified LLM access: One endpoint for OpenAI GPT-5.2, Anthropic Claude Opus 4.5, and Google Gemini 3 models through Vercel AI Gateway

Extraction progress dashboard showing transcript processing status

Real-time progress showing extraction status for batch processing

Deploy once, then tune everything through LaunchDarkly while Vercel handles the infrastructure.

Complete setup

Prerequisites

  • Node.js 18+ and basic TypeScript knowledge
  • LaunchDarkly account with AI Configs enabled (quickstart guide)
  • Vercel account (free tier works) or API keys for local development
  • 100+ transcripts or documents to process (any format)

Quick start

Clone the complete example to skip setup and start extracting immediately:

$git clone https://github.com/launchdarkly-labs/scarlett-feature-extraction.git
>cd scarlett-feature-extraction

Install dependencies

$# Node.js dependencies for the extraction pipeline
>npm install @launchdarkly/node-server-sdk @launchdarkly/server-sdk-ai @launchdarkly/server-sdk-ai-vercel
>npm install ai @ai-sdk/openai

SDK Documentation: LaunchDarkly Node.js AI SDK with Vercel provider provides complete reference for all SDK features.

For ML model training (optional), set up a Python environment:

$python3 -m venv venv && source venv/bin/activate
>pip install catboost scikit-learn pandas numpy joblib requests python-dotenv

Note: For Python-based pipelines, see the LaunchDarkly Python AI SDK (doesn’t include Vercel provider).

Configure environment

$# 1. Configure environment variables
>cp .env.example .env
># Add your keys to .env:
># LAUNCHDARKLY_SDK_KEY=sdk-xxxxx
># LD_API_KEY=api-xxxxx (for bootstrap only)
># LD_PROJECT_KEY=your-project (for bootstrap only)
>
># 2. Get Vercel AI Gateway token (for local development)
>npx vercel env pull # Generates VERCEL_OIDC_TOKEN, refresh every 12 hours
>
># 3. Bootstrap LaunchDarkly AI Config (one-time)
>python bootstrap/create_unified_config.py

Environment variables explained:

  • LAUNCHDARKLY_SDK_KEY: Runtime SDK key for feature flags
  • AI_GATEWAY_API_KEY: Required for Vercel deployments
  • VERCEL_OIDC_TOKEN: Auto-generated for local development via npx vercel env pull
  • LD_API_KEY & LD_PROJECT_KEY: Only for bootstrap script to create configs

Project structure

your-project/
β”œβ”€β”€ bootstrap/
β”‚ └── create_unified_config.py # Creates LaunchDarkly config (run once)
β”œβ”€β”€ lib/
β”‚ β”œβ”€β”€ pipeline.ts # Extraction pipeline
β”‚ └── launchdarkly-client.ts # LaunchDarkly AI SDK integration
β”œβ”€β”€ app/
β”‚ β”œβ”€β”€ page.tsx # Upload UI
β”‚ └── api/
β”‚ └── extract-stream/ # API endpoint for extraction
β”œβ”€β”€ LAUNCHDARKLY_TOOLS.json # Your field schemas (customize this!)
└── .env # Your API keys

LaunchDarkly configuration

Where the extraction schemas live

The field definitions and extraction schemas are stored in two key locations:

  1. LAUNCHDARKLY_TOOLS.json - Contains all field schemas:

    • 40 core fields shared across all tools (sentiment scores, engagement metrics, etc.)
    • Tool-specific fields for each document type
    • This file defines what data you’ll extract
  2. bootstrap/create_unified_config.py - Sets up LaunchDarkly:

    • Reads the schemas from LAUNCHDARKLY_TOOLS.json
    • Creates the AI Config in LaunchDarkly named transcript-extraction-unified
    • Attaches all 6 extraction tools to this single config
    • Run once: python bootstrap/create_unified_config.py

The 6 extraction tools

Each tool is defined in LAUNCHDARKLY_TOOLS.json with specific fields for different call types:

Prospecting tool (extract_prospecting_features - 43 fields)

  • Where it’s defined: LAUNCHDARKLY_TOOLS.json β†’ variation_a_prospecting
  • Use case: First contact, cold outreach, gatekeeper conversations
  • Example fields: gatekeeper_encountered, callback_scheduled, interest_level

Discovery tool (extract_discovery_features - 48 fields)

  • Where it’s defined: LAUNCHDARKLY_TOOLS.json β†’ variation_b_discovery
  • Use case: Qualification calls, needs assessment
  • Example fields: budget_confirmed, authority_level, qualification_score

Demo tool (extract_demo_features - 58 fields)

  • Where it’s defined: LAUNCHDARKLY_TOOLS.json β†’ variation_c_demo
  • Use case: Product demonstrations, feature walkthroughs
  • Example fields: customer_wow_moments, demo_effectiveness_score, trial_requested

Proposal tool (extract_proposal_features - 53 fields)

  • Where it’s defined: LAUNCHDARKLY_TOOLS.json β†’ variation_d_proposal
  • Use case: Pricing discussions, contract negotiations
  • Example fields: close_probability, discount_requested, blockers_to_close

Technical tool (extract_technical_features - 63 fields)

  • Where it’s defined: LAUNCHDARKLY_TOOLS.json β†’ variation_e_technical
  • Use case: Architecture reviews, technical deep-dives
  • Example fields: technical_fit_score, technical_risk_score, scalability_concerns

Customer Success tool (extract_customer_success_features - 53 fields)

  • Where it’s defined: LAUNCHDARKLY_TOOLS.json β†’ variation_f_customer_success
  • Use case: QBRs, renewal discussions, expansion opportunities
  • Example fields: account_health_score, renewal_likelihood, churn_risk

ML Preview: These extracted fields become features for predictive models - Part 2 will show you how to train models that predict deal outcomes, customer churn, and more using the data you extract here.

How to customize the schemas

You have two options for customizing your extraction schemas:

Option 1: Edit directly in LaunchDarkly UI

  1. Navigate to LaunchDarkly β†’ AI Configs β†’ transcript-extraction-unified
  2. Click on any tool (e.g., β€œExtract Prospecting Features”)
  3. Edit the JSON schema directly in the UI - add/remove fields instantly
  4. Save changes - they apply immediately, no deployment needed!

This is perfect for quick iterations - discover a new predictive signal? Add it in 30 seconds.

Option 2: Update via code

  1. Edit LAUNCHDARKLY_TOOLS.json:
1// Add a new field to prospecting tool:
2"variation_a_prospecting": {
3 "function": {
4 "parameters": {
5 "properties": {
6 // Add your custom field here:
7 "competitor_switching_intent": {
8 "type": "boolean",
9 "description": "Intent to switch from competitor"
10 }
11 }
12 }
13 }
14}
  1. Re-run bootstrap to update LaunchDarkly:
$python bootstrap/create_unified_config.py
  1. Your extraction automatically uses the new schema - no code changes needed!

Start with 100 transcripts to validate your tools. You’ll likely learn more from the first 100 extractions than weeks of planning. Watch which tools the AI model selects and refine the prompts if it’s choosing incorrectly.

Implementation & usage

Core implementation

Here’s the extraction implementation:

1// Source: github.com/launchdarkly-labs/scarlett-feature-extraction
2// lib/launchdarkly-client.ts:101-198 (simplified)
3import * as ld from "@launchdarkly/node-server-sdk";
4import { initAi } from "@launchdarkly/server-sdk-ai";
5import { VercelProvider } from "@launchdarkly/server-sdk-ai-vercel";
6
7export class LaunchDarklyAIClient {
8 private ldClient: ld.LDClient;
9 private aiClient: any;
10
11 async initialize(): Promise<void> {
12 this.ldClient = ld.init(process.env.LAUNCHDARKLY_SDK_KEY!);
13 await this.ldClient.waitForInitialization();
14 this.aiClient = initAi(this.ldClient);
15 }
16
17 async extractStructuredFeatures(params: {
18 configKey: string; // "transcript-extraction-unified"
19 context: ld.LDContext;
20 transcript: string;
21 }): Promise<any> {
22 // Get AI config with all 6 extraction tools
23 const aiConfig = await this.aiClient.completionConfig(
24 params.configKey,
25 params.context,
26 { enabled: false }
27 );
28
29 // Extract schema from first tool (all share core fields)
30 const tools = aiConfig.model?.parameters?.tools || [];
31 const jsonSchema = tools[0].parameters;
32
33 // Create Vercel AI Gateway client
34 const { createOpenAI } = await import("@ai-sdk/openai");
35 const vercelGateway = createOpenAI({
36 baseURL: "https://ai-gateway.vercel.sh/v1",
37 apiKey: process.env.VERCEL_OIDC_TOKEN || process.env.AI_GATEWAY_API_KEY,
38 });
39
40 // Use the model from LaunchDarkly config
41 const model = vercelGateway.chat(
42 `${aiConfig.provider.name}/${aiConfig.model.name}`
43 );
44
45 // AI automatically selects appropriate tool based on content
46 const provider = new VercelProvider(model, aiConfig.model.parameters);
47 const response = await provider.invokeStructuredModel(
48 [
49 { role: "system", content: aiConfig.config.messages[0].content },
50 { role: "user", content: `Transcript:\n\n${params.transcript}` }
51 ],
52 jsonSchema
53 );
54
55 return response.data;
56 }
57}

How to run the extraction pipeline

Once your schemas are configured, running extraction is straightforward:

$# 1. Start the development server
>npm run dev
>
># 2. Open http://localhost:3000
># 3. Upload transcript files (.txt or .md)
># 4. Click "Extract Features"
># 5. Download the CSV with all extracted fields

The AI model automatically selects the right extraction tool based on the transcript content.

Beyond sales calls - what else you can extract

This architecture works for any unstructured data extraction need. The pipeline code never changes - just the LaunchDarkly configuration.

Support ticket analysis

  • Extract urgency scores, issue categories, product areas, customer effort scores
  • Route urgent tickets to detailed schemas, low-priority to streamlined ones
  • ML applications: Predict escalation likelihood, estimate resolution time, auto-assign to right team, identify product bugs from patterns

Customer review mining

  • Pull out feature mentions with sentiment, competitor comparisons, recommendation likelihood
  • Different product lines can use different LaunchDarkly configs with tailored extraction tools
  • ML applications: Predict NPS scores, identify feature requests, forecast churn from negative patterns, cluster customers by satisfaction drivers

Interview transcript processing

  • Extract technical competency signals, communication clarity, culture fit indicators
  • Different roles need different schemas - handle this through LaunchDarkly targeting rules
  • ML applications: Predict candidate success probability, identify skill gaps, score cultural alignment, reduce hiring bias through standardized signals

Medical consultation transcripts

  • Extract symptoms, treatment discussions, medication mentions, follow-up requirements
  • Ensure HIPAA compliance by redacting PII before extraction
  • ML applications: Predict readmission risk, identify medication adherence issues, flag potential diagnoses for review, optimize appointment scheduling

Legal document analysis

  • Extract contract terms, risk clauses, obligations, and deadlines
  • Route different document types (NDAs, MSAs, employment contracts) to specialized schemas
  • ML applications: Assess contract risk scores, identify non-standard terms, predict negotiation outcomes, flag compliance issues

Earnings call transcripts

  • Extract forward-looking statements, financial metrics, competitive positioning
  • Capture management sentiment and guidance changes
  • ML applications: Predict stock price movements, identify leadership confidence levels, detect financial health indicators, compare guidance to historical accuracy

Privacy and sensitive data

  • Add PII detection step before extraction - scan for emails, phone numbers, SSNs, names
  • Either redact ([REDACTED]) or skip extraction based on compliance requirements
  • ML applications: Route by geography using LaunchDarkly targeting - EU transcripts use privacy-safe schemas, other regions get full extraction

When to use this approach

Use this when:

  • Processing 100-10,000 documents monthly
  • Schema needs frequent iteration
  • Different document types need different treatment
  • You’re bootstrapping training data

Skip this when:

  • Processing millions of documents (use traditional NLP)
  • Schema is fixed and proven
  • You need sub-second latency
  • Documents follow strict templates

What’s next

In Part 2, I’ll show how to use these extracted features for ML models. The challenge: sparse outcomes (most deals don’t close, most candidates aren’t hired). I’ll demonstrate a zero-inflated regression approach that actually works with real-world data.

Ready to build? Start with your messiest transcripts - that’s where you’ll learn what features really matter.

Further reading