Moonshots XXII: Hack to the Future recap

Published August 28th, 2025

Tilde Thurium, LaunchDarkly Developer Relations Manager

Why hackathons? Because where we’re going, we don’t need boring meetings! Hackathons spark creativity, forge connections across time zones and teams, and supercharge learning by doing. Builders are kinesthetic learners who learn by creating, so we foster a place for them to create. After all, the best way to invent the future is to build it.

In keeping with LaunchDarkly’s general space theming and penchant for puns, our hackathons are called Moonshots. This year we celebrated our 22nd official Moonshot — Moonshots XXII: Hack to the Future.

Gif of the Hack to the Future illustrated poster. It's flashing alternating green and purple. — In the future, all fliers are animated.

Great Scott! The projects (and puns) abounded. Over 50 demos were submitted, 14 projects named winners, and ~40 people awarded prizes. I can’t possibly cover them all in enough detail to do the event justice, so here’s a quick recap of five of my favorites, in no particular order:

evals evals evals evals evals

Problem: LaunchDarkly developers want to roll back poorly behaving models before their customers experience problems. AI models in production can fail in subtle or unexpected ways that are hard to catch before users are impacted. Unlike traditional software, generative AI is non-deterministic and environment-sensitive – a prompt or model change that seemed fine in staging can behave very differently in production.

Solution: Eval Configs. In practice, an Eval Config can be attached alongside a primary AI Config (the one generating content) as a sub-agent to automatically score each output on quality dimensions such as factual accuracy, contextual grounding (hallucination detection), and relevance.

When using an Eval Config, any application can make evaluation-driven decisions live, including whether or not to guardrail or otherwise action on model behavior within a live exposure (e.g. before a customer sees the result). Did someone say live hallucination detection in production?

Dark Skies: The SDK Odyssey

Tired: corporate certifications. Wired: joyful, gamified learning. A cross functional team of engineers, designers, and marketers converged on a wayward pixellated bus with a chiptune soundtrack. The goal of this project is to teach new users about LaunchDarkly SDKs by flying a VW camper van through space, dodging misconfigured SDKs. The prototype was built with Pygame, as well as custom sprites and sound assets.

Gif of an 8-bit bus bouncing up and down. — It's giving Twilio Quest.

Roll your bus over to GitHub and see if you can beat our high score: https://github.com/launchdarkly-labs/dark-skies-sdk-odyssey

FeaturePilot

It’s important to consider feature management at all stages of the software development lifecycle. Long term maintenance, such as feature flag cleanup, is less exciting than new launches. Ignoring maintenance risks piling up technical debt that will ultimately slow you down.

FeaturePilot integrates AI agents with LaunchDarkly’s feature management platform. It enables one-click automation of code changes and routine tasks directly from the LaunchDarkly UI.

Your choice of agent is flexible: with Devin, Cursor or something home grown, LaunchDarkly users can trigger workflows such as cleaning up deprecated feature flags, scaffolding SDK integration code, and implementing flags in code with minimal effort.

This integration aims to accelerate development, reduce technical debt, and make feature releases even more seamless for all LaunchDarkly users.

DeLorean Test Runner

Even though unit and integration tests pass in CI, subtle UI regressions and workflow hiccups frequently slip through after a merge. Manual spot-checking of every pull request doesn’t scale when dozens of PRs land each day, leading to broken user flows in staging or production.

This hacking team built a GitHub App–driven test runner to automatically run each PR’s real-world test scenarios in a headless browser and detect anomalies before merge. Here’s the steps it follows:

Scenario Extraction
- Parse test steps from the PR description.
Headless Browser Execution
- Launch a Chromium instance via Puppeteer/Playwright on AWS EC2.
- Replay clicks, form-fills, navigations exactly as human tests would.
Anomaly Detection
- Capture console errors, JavaScript exceptions, and HTTP 4xx/5xx failures.
- Take visual snapshots at key steps and compare against baselines to spot occlusions or layout shifts.
Report Posting
- Compose a concise Markdown report of pass/fail statuses, error logs, and diff images.
- Post the report as a GitHub Check run or PR comment, giving instant feedback to the author.

With the DeLorean Test Runner in place, every PR “time-travels” through a simulated browser QA check at 88 MPH — catching UX bugs long before they reach staging or production.

BrunchDarkly

If avocado toast and $6 lattes are the problem (to the millennial housing crisis, obviously), then BrunchDarkly may be the solution! Find out for yourself at brunchdarkly.com. Try toggling the flags on the Admin modal to change the website’s UI in real time: filtering by dietary restrictions, modifying dynamic pricing, showing and hiding personalized recommendations, and more. It’s fun to expose flag flipping on the front end.

Screenshot of the admin panel for BrunchDarkly. There are various toggles which can be flipped, such as Allergen Warning and Limited Time Offer. — If only dietary restriction filters existed for all restaurant menus in real life.

Conclusion: “It works! It works!”

If you still can’t get enough of the Moonshots vibes, listen to our collaborative playlist or sign up for our newsletter to get updates on what we’re shipping into production.