On-Prem AI Crushes Cloud on Latency
For real-time AI inference, on-premise hardware is delivering a massive performance edge over cloud APIs. Rack-local chips are enabling single-digit millisecond latencies, making ubiquitous background processing for tasks like anomaly detection feasible without cost or speed penalties. Akamai's CFO echoed this, noting that for some workloads, even minor compute delays are unacceptable.
For high-frequency trading (HFT), the debate between on-premise and cloud infrastructure is heavily skewed by latency requirements. While cloud platforms offer scalability, on-premise solutions provide the raw speed necessary for competitive trading. The physical proximity of on-premise hardware to data sources minimizes network delays, a critical factor when trades are executed in microseconds. Kernel bypass techniques are a key technology in achieving ultra-low latency for on-premise systems. By allowing trading applications to communicate directly with network interface cards (NICs), kernel bypass avoids the processing overhead of the operating system, which can introduce delays of 20-50 microseconds. This direct hardware access is essential for HFT, where even microsecond delays can result in significant financial losses. Field-Programmable Gate Arrays (FPGAs) represent another leap in on-premise performance, executing trading algorithms directly in hardware. This approach offers deterministic, nanosecond-level latency, a significant improvement over even the most optimized software-based systems. Unlike CPUs, which process instructions sequentially, FPGAs enable parallel execution of tasks like market data parsing and order generation, eliminating context switching and other software-related delays. The combination of kernel bypass and FPGAs has become a staple in the infrastructure of elite HFT firms. For example, Jane Street utilizes network cards with onboard FPGAs to filter the majority of market data messages at the hardware level, ensuring that only the most relevant information reaches the CPU. Similarly, Citadel Securities employs DPDK (Data Plane Development Kit) based systems to process over a billion market data messages per second with median latencies under one microsecond. While on-premise solutions excel in latency, they require significant capital expenditure and operational overhead. In contrast, cloud providers offer a pay-as-you-go model that is more flexible for workloads with fluctuating demand. However, for the predictable and constant high-performance needs of algorithmic trading, the long-term total cost of ownership for on-premise infrastructure can be more favorable. The financial industry is increasingly adopting AI for tasks like predictive analytics and risk management. However, the computational complexity of some AI models can introduce latency, making them unsuitable for the most time-sensitive HFT strategies. This has led to a hybrid approach in some firms, where model training may occur in the cloud, but real-time inference is handled by on-premise hardware to meet strict latency requirements. Akamai, a major player in content delivery networks, is expanding into distributed AI inference, acknowledging the growing need for low-latency compute closer to the end-user. The company's strategy involves deploying thousands of NVIDIA Blackwell GPUs across its global network to create a platform for AI research, fine-tuning, and inference. This move highlights a broader industry trend of decentralizing AI workloads to mitigate the latency issues inherent in centralized cloud data centers. The adoption of AI in financial services is widespread, with a recent Finastra report indicating that only 2% of financial institutions have no AI implementation. The primary use cases are currently in risk management and fraud detection, with a growing focus on AI-driven personalization and automation. Despite the high adoption rates, a significant portion of AI projects in finance have reportedly been delayed or have not met ROI expectations, often due to a shortage of talent with expertise in both AI and the regulatory landscape of the financial industry.