Crypto Exchange Touts 40k TPS Engine Upgrade

Digital asset exchange Phemex announced a massive upgrade to its futures trading engine, now claiming 40,000 transactions per second. While crypto operates differently, the performance benchmark pushes the boundaries of matching engine tech and offers a reference point for modernization in traditional finance.

Phemex's upgrade to 40,000 TPS was achieved through system-level performance optimization without additional hardware, focusing on improving processing capacity, response speed, and reliability. This enhancement increased overall trading throughput by over 60% from its previous capacity of approximately 25,000 TPS. The firm, founded in 2019 by former Morgan Stanley executives, initially designed its architecture to handle over 300,000 orders per second with a response time of less than 0.2ms. The upgrade also targeted latency reduction in critical operations, cutting down funding settlement time from 10 seconds to about 500ms, a reduction of over 90%. This was accomplished by optimizing architecture and scheduling, which also led to a 50% decrease in daily CPU usage for the trading engine and a 30% reduction in memory consumption for the risk engine. The new decentralized multi-node architecture is designed to eliminate single points of failure, enhancing stability during high-volume trading. For comparison, major blockchain networks have significantly lower native transaction speeds; Bitcoin averages around 7 TPS and Ethereum (L1) processes about 13-15 TPS. High-throughput blockchains like Solana have claimed speeds of up to 65,000 TPS, though real-world performance often differs from marketing benchmarks. Traditional payment systems like Visa can handle tens of thousands of transactions per second. To achieve ultra-low latency, high-frequency trading (HFT) firms often turn to specialized hardware and software techniques. Field-Programmable Gate Arrays (FPGAs) are used to process market data directly in hardware, enabling sub-nanosecond decision-making by parsing exchange protocols like FIX and ITCH at the hardware level. This approach offers deterministic latency, avoiding the overhead and jitter of operating systems and general-purpose CPUs. Kernel bypass techniques are another critical component for reducing latency in HFT systems. Libraries such as DPDK and OpenOnload allow applications to interact directly with network interface cards (NICs), skipping the kernel's network stack to minimize processing overhead. This direct hardware access is essential for handling the high message rates of modern exchanges, which can exceed 1 million packets per second during market volatility. The debate between on-premises and cloud deployments continues to be a key architectural decision. On-premise solutions are traditionally favored for ultra-low latency applications due to greater control over hardware and network infrastructure. However, cloud platforms are increasingly offering comparable performance with added benefits of scalability, flexibility, and reduced upfront investment, leading many financial institutions to adopt cloud or hybrid models.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.