X's Open-Source Algorithm Rules Analyzed
An analysis of X's open-source recommendation algorithm reveals several content ranking rules. The algorithm is shown to reward topical authority and follower-to-following ratios, provide a boost for verified accounts, and penalize users with high tweet deletion rates.
- The recommendation system operates on a three-stage pipeline to handle the scale of 500 million daily tweets: first sourcing ~1,500 candidate tweets, then ranking them with a machine learning model, and finally applying filters for diversity and safety. - The "For You" timeline is a mix of about 50% "in-network" tweets from followed accounts and 50% "out-of-network" tweets discovered by the algorithm. In-network content is retrieved from an in-memory cache system named "Thunder" for sub-millisecond latency. - Out-of-network candidates are primarily found using "SimClusters," a method that uses matrix factorization to identify communities of users with similar interests, and a two-tower neural network that generates embeddings for users and tweets to find content that is mathematically similar to a user's interests. - The core ranking model, named "Phoenix," is a Grok-based Transformer model that predicts the probability of various engagements, such as likes, replies, and reposts. This replaced an older neural network that had approximately 48 million parameters. - Unlike Netflix's hybrid filtering or YouTube's focus on watch time, X's system explicitly avoids most hand-engineered features like post length or media type, relying on the Transformer model to learn relevance directly from user interaction sequences. - After scoring, a series of heuristics are applied, including an "author diversity" filter to prevent too many consecutive tweets from one account and "feedback-based fatigue" which reduces the score of tweets that have received negative user feedback. - The open-source code is primarily written in Scala, a language often used for large-scale distributed systems, with the main service orchestrating the timeline called Home Mixer.