Uber Uses Feature Stores for LLMs
Uber's ML platform team has re-purposed its feature store for large language model applications. The system is used for prompt engineering and custom retrieval augmentation, ensuring A/B test repeatability and consistent model evaluation across different teams.
- Uber's application of feature stores for LLMs extends to their internal "Genie" platform, which uses an Enhanced Agentic Retrieval-Augmented Generation (EAg-RAG) system to improve the accuracy of its internal Q&A chatbots, resulting in a 27% increase in acceptable answers. Another internal tool, "QueryGPT," leverages RAG to convert natural language questions into SQL queries, saving the company a reported 140,000 hours in query writing time annually. - To manage the lifecycle of LLM inputs, Uber developed a centralized "Prompt Engineering Toolkit". This system includes a model catalog, a playground for experimentation, version control for prompt templates, and functionality to enrich prompts with data from their feature store and RAG systems. - Netflix employs a similar concept with its "Embedding Store," a specialized feature store for managing the vector representations of members and content. This infrastructure is critical for their personalization foundation model, which refreshes embeddings daily and supports near-real-time updates based on user interactions during a session. - Pinterest's recommendation engine is built on a two-tower model that generates user and item (Pin) embeddings in real-time. These embeddings are stored and served from an in-house Approximate Nearest Neighbor (ANN) system called Manas, which allows for efficient retrieval of relevant content from billions of items for over 500 million users. - Spotify's recommendation system uses a funnel architecture, starting with a candidate generation model to narrow down millions of tracks before applying a more complex ranking algorithm. Their models are trained on a sample of approximately 700 million user-generated playlists to understand the relationships between songs. - Meta is also leveraging LLM-inspired architectures for recommendations with its Generative Ads Recommendation Model (GEM). This foundation model, trained on thousands of GPUs, has led to a 5% increase in ad conversions on Instagram and a 3% increase on Facebook Feed. - The concept of a feature store is a key component of MLOps, as it ensures consistency between the data used for training models and the data used for serving predictions in production, which is critical for reliable A/B testing and preventing performance degradation. Companies like Netflix and Pinterest have extensive MLOps practices for managing hundreds of models, including automated A/B testing frameworks and monitoring for feature drift.