In this four-part blog series, we’ll cover how GenAI is transforming software delivery, the new challenges it introduces, and how LaunchDarkly can help teams build and deliver new GenAI features within a matter of hours, not weeks.
In a 2017 interview, Jensen Huang, CEO of Nvidia, predicted that: “software is eating the world, but AI is going to eat software”. By the looks of it in 2024, that prediction has come true.
For businesses, AI is eating software in more ways than one. First off, AI pair programmers like GitHub Copilot and Cursor are becoming ubiquitous across engineering teams, helping them to be more productive and accelerate the pace of delivery. Secondly, GenAI is changing the types of software that engineers are building. GenAI features can be differentiators no matter the industry or size of your business. To that end, many product teams are being asked to “build with AI” in their core products. A 2023 survey from Emergence Capital showed that while most releases last year were AI-related, only about half of them had the business impact that these teams had hoped for.
While the wave of excitement behind GenAI is massive, the reality of taking non-deterministic applications to production at enterprise scale is complicated and risky.
Many companies are proving GenAI value by solving internal use cases (like document summarization and chat bots trained on internal knowledge bases) first, before building software for end users. But while the wave of excitement behind GenAI is massive, the reality of taking non-deterministic applications to production at enterprise scale is complicated and risky.
There are two main reasons for this. One is that GenAI fundamentally changes the traditional model of software development (we’ll get into that a bit more in the next blog in this series). The other is that AI is moving really fast. LLMs themselves, and the state of the art in techniques like RAG, are improving at a rapid rate.
Newer, Better, Faster vs. Safety, Reliability
Just because a new model or technique or prompt can produce a certain quality of output doesn’t necessarily mean that making it perform reliably and at scale is easy in conjunction with the following other conditions:.
- The pace of innovation of LLMs is so fast that software teams building with GenAI are often hard-pressed to determine what model configuration and prompt combination will achieve an optimal outcome before the next model comes along.
- While an older LLM that is optimized for a particular use case may be more desirable for software teams than a new untested model, there is constant pressure to improve given the march of progress month over month and year over year.
- The plethora of model variants, and the fact that new models are not always categorically better, creates additional challenges. New models can be generally more powerful but still contain weaknesses that teams need to learn about and account for.
- Models and their configurations are only one part of the equation. How do you find a way to prompt an LLM to reliably deliver an outcome? Should you instruct your chatbot to be ‘friendly and helpful’ or ‘reliable and helpful’ or ‘reliable and polite’ to get the best, most reproducible outcomes? These are the kinds of questions that AI engineering teams need to take into account.
Conclusion
Constantly optimizing prompt & model configurations is not easy, and engineering teams feel pressure to deliver state-of-the-art experiences to keep up with the competiton. However, exposing non-deterministic software experiences to end users requires a heightened level of safety. Finding the strategies and tools to balance these needs is critical as GenAI features move from being considered “exotic” to “expected” for modern software experiences.
Up Next
In the next blog of this series, we’ll cover some of the ways in which GenAI development differs from the traditional SDLC, and the types of challenges and opportunities that it introduces. In the meantime, try out LaunchDarkly’s AI model flags and AI prompt flags to keep up with the pace of AI innovation by introducing new models, prompts, and configurations at runtime, and rollback instantly in case there’s an issue.