X's Open-Source Algorithm Rules Analyzed

Published by The Daily Scout

What happened

An analysis of X's open-source recommendation algorithm reveals several content ranking rules. The algorithm is shown to reward topical authority and follower-to-following ratios, provide a boost for verified accounts, and penalize users with high tweet deletion rates.

Why it matters

- The recommendation system operates on a three-stage pipeline to handle the scale of 500 million daily tweets: first sourcing ~1,500 candidate tweets, then ranking them with a machine learning model, and finally applying filters for diversity and safety. - The "For You" timeline is a mix of about 50% "in-network" tweets from followed accounts and 50% "out-of-network" tweets discovered by the algorithm. In-network content is retrieved from an in-memory cache system named "Thunder" for sub-millisecond latency. - Out-of-network candidates are primarily found using "SimClusters," a method that uses matrix factorization to identify communities of users with similar interests, and a two-tower neural network that generates embeddings for users and tweets to find content that is mathematically similar to a user's interests. - The core ranking model, named "Phoenix," is a Grok-based Transformer model that predicts the probability of various engagements, such as likes, replies, and reposts. This replaced an older neural network that had approximately 48 million parameters. - Unlike Netflix's hybrid filtering or YouTube's focus on watch time, X's system explicitly avoids most hand-engineered features like post length or media type, relying on the Transformer model to learn relevance directly from user interaction sequences. - After scoring, a series of heuristics are applied, including an "author diversity" filter to prevent too many consecutive tweets from one account and "feedback-based fatigue" which reduces the score of tweets that have received negative user feedback. - The open-source code is primarily written in Scala, a language often used for large-scale distributed systems, with the main service orchestrating the timeline called Home Mixer.

Key numbers

  • - The recommendation system operates on a three-stage pipeline to handle the scale of 500 million daily tweets: first sourcing ~1,500 candidate tweets, then ranking them with a machine learning model, and finally applying filters for diversity and safety.
  • The "For You" timeline is a mix of about 50% "in-network" tweets from followed accounts and 50% "out-of-network" tweets discovered by the algorithm.
  • This replaced an older neural network that had approximately 48 million parameters.

Quick answers

What happened in X's Open-Source Algorithm Rules Analyzed?

An analysis of X's open-source recommendation algorithm reveals several content ranking rules. The algorithm is shown to reward topical authority and follower-to-following ratios, provide a boost for verified accounts, and penalize users with high tweet deletion rates.

Why does X's Open-Source Algorithm Rules Analyzed matter?

The recommendation system operates on a three-stage pipeline to handle the scale of 500 million daily tweets: first sourcing ~1,500 candidate tweets, then ranking them with a machine learning model, and finally applying filters for diversity and safety. The "For You" timeline is a mix of about 50% "in-network" tweets from followed accounts and 50% "out-of-network" tweets discovered by the algorithm. In-network content is retrieved from an in-memory cache system named "Thunder" for sub-millisecond latency. Out-of-network candidates are primarily found using "SimClusters," a method that uses matrix factorization to identify communities of users with similar interests, and a two-tower neural network that generates embeddings for users and tweets to find content that is mathematically similar to a user's interests. The core ranking model, named "Phoenix," is a Grok-based Transformer model that predicts the probability of various engagements, such as likes, replies, and reposts. This replaced an older neural network that had approximately 48 million parameters. Unlike Netflix's hybrid filtering or YouTube's focus on watch time, X's system explicitly avoids most hand-engineered features like post length or media type, relying on the Transformer model to learn relevance directly from user interaction sequences. After scoring, a series of heuristics are applied, including an "author diversity" filter to prevent too many consecutive tweets from one account and "feedback-based fatigue" which reduces the score of tweets that have received negative user feedback. The open-source code is primarily written in Scala, a language often used for large-scale distributed systems, with the main service orchestrating the timeline called Home Mixer.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Published by The Daily Scout - Be the smartest in the room.