Uber Uses Feature Stores for LLMs

Uber's ML platform team has re-purposed its feature store for large language model applications. The system is used for prompt engineering and custom retrieval augmentation, ensuring A/B test repeatability and consistent model evaluation across different teams.

- Uber's application of feature stores for LLMs extends to their internal "Genie" platform, which uses an Enhanced Agentic Retrieval-Augmented Generation (EAg-RAG) system to improve the accuracy of its internal Q&A chatbots, resulting in a 27% increase in acceptable answers. Another internal tool, "QueryGPT," leverages RAG to convert natural language questions into SQL queries, saving the company a reported 140,000 hours in query writing time annually. - To manage the lifecycle of LLM inputs, Uber developed a centralized "Prompt Engineering Toolkit". This system includes a model catalog, a playground for experimentation, version control for prompt templates, and functionality to enrich prompts with data from their feature store and RAG systems. - Netflix employs a similar concept with its "Embedding Store," a specialized feature store for managing the vector representations of members and content. This infrastructure is critical for their personalization foundation model, which refreshes embeddings daily and supports near-real-time updates based on user interactions during a session. - Pinterest's recommendation engine is built on a two-tower model that generates user and item (Pin) embeddings in real-time. These embeddings are stored and served from an in-house Approximate Nearest Neighbor (ANN) system called Manas, which allows for efficient retrieval of relevant content from billions of items for over 500 million users. - Spotify's recommendation system uses a funnel architecture, starting with a candidate generation model to narrow down millions of tracks before applying a more complex ranking algorithm. Their models are trained on a sample of approximately 700 million user-generated playlists to understand the relationships between songs. - Meta is also leveraging LLM-inspired architectures for recommendations with its Generative Ads Recommendation Model (GEM). This foundation model, trained on thousands of GPUs, has led to a 5% increase in ad conversions on Instagram and a 3% increase on Facebook Feed. - The concept of a feature store is a key component of MLOps, as it ensures consistency between the data used for training models and the data used for serving predictions in production, which is critical for reliable A/B testing and preventing performance degradation. Companies like Netflix and Pinterest have extensive MLOps practices for managing hundreds of models, including automated A/B testing frameworks and monitoring for feature drift.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.