Prioritize colocation over GPUs
- On May 20, 2026, a trading-industry reply argued GPUs help signal computation, but execution speed still depends on colocation, kernel bypass, NIC tuning and exchange proximity. - Nasdaq says colocated clients can cut round-trip latency by an average two to five microseconds, while its co-location environment offers sub-50-microsecond order-to-ack performance. (nasdaqtrader.com) - Nasdaq’s Proximity on Demand service in NY11 and CME’s ultra-low-latency market-data products show where firms can buy nearer-term execution infrastructure. (nasdaq.com)
A May 20 reply in a trading discussion drew a line between compute speed and execution speed, saying GPUs can help generate signals but do not determine how fast an order reaches an exchange. The post pointed instead to colocation, kernel bypass, FPGA or SmartNIC stacks and physical proximity to trading venues as the main levers for sub-millisecond execution. That distinction matches how exchanges and low-latency networking vendors describe their own infrastructure. (nasdaqtrader.com) It also shifts the spending question for trading firms from general-purpose compute expansion toward tighter control of the packet path. ### Why doesn’t a faster GPU automatically make an order hit the market faster? (nasdaq.com) A GPU accelerates computation, not the full path between a trading model and an exchange gateway. In a low-latency stack, the order still has to move through the server, NIC, operating system or bypass layer, switches, cross-connects and the exchange entry point before it can be acknowledged. The social-media post’s point was that faster signal generation does not remove those transport and queuing steps. CME Group markets “ultra-low latency real-time feeds” and detailed real-time order-book data to professional trading users, underscoring that market access itself is sold as a latency-sensitive product. Nasdaq says its co-location offering lets participants reduce latency and network complexity by placing equipment inside the Nasdaq data center. (nasdaqtrader.com) ### What does colocation buy a trading firm that remote compute cannot? Nasdaq says colocated cabinets provide “proximity to the speed and liquidity” of its U.S. markets. The exchange says clients using its 40G and 10G Ultra connectivity can reduce round-trip latency by an average of two to five microseconds, and it describes order-to-ack and market-data order-to-tick latency in its facility as sub-50 microseconds. Nasdaq’s newer Proximity on Demand product makes the same case in a more packaged form. The service offers dedicated or virtual compute inside Nasdaq’s NY11 co-location environment, with pre-installed hardware and 10G connectivity, plus packet monitoring for latency tracking and microburst measurement. (cmegroup.com) That design centers the location of the server and the quality of the network path, not GPU density. ### Where does kernel bypass fit into that picture? AMD’s Onload user guide says the software is designed to achieve very low latency with minimum jitter on systems fitted with Solarflare network adapters by using kernel-bypass network acceleration middleware. (nasdaqtrader.com) In practice, kernel bypass removes parts of the conventional operating-system network stack from the hot path so packets can move between the NIC and the application with less overhead. The DPDK documentation shows the same pattern on the hardware side. Its Solarflare poll mode driver supports adapters and SmartNICs across 10 to 100 Gbps and includes features such as multiple transmit and receive queues, checksum offload and hardware statistics. (nasdaq.com) Those are packet-path tools, not general AI-compute tools. ### Why do FPGA and SmartNIC stacks keep coming up in this debate? DPDK’s supported-device list includes SmartNIC products alongside Solarflare and Xilinx-branded adapters, reflecting how network-interface hardware is used as part of the latency budget. In these setups, firms offload selected networking or packet-processing tasks closer to the wire instead of leaving everything to the host CPU. (amd.com) The result is that infrastructure decisions for execution systems tend to separate into tiers. GPUs may still matter for research, model training or signal generation, but order entry and market-data handling are more directly shaped by colocation, NIC behavior, kernel bypass and exchange connectivity, according to the exchange and vendor material reviewed here. (doc.dpdk.org) That is an inference from those sources, not a direct quote. ### What should firms watch next if they are deciding where to spend? Nasdaq’s current co-location pages and Proximity on Demand specifications provide the clearest public reference points for firms comparing direct colocation with managed proximity services. (doc.dpdk.org) CME’s market-data catalog remains the public source for its ultra-low-latency feed offerings. Any near-term shift in exchange connectivity products, SmartNIC support or kernel-bypass tooling is most likely to show up first in those product pages and vendor documentation. (nasdaqtrader.com)