Capital‑markets teams push feature‑extraction latency below 5 ms for model inputs

- NVIDIA said on April 2 its GH200 Grace Hopper system hit single-digit microsecond latency in the STAC-ML Markets benchmark, a finance test for model inference on live market-data streams. - STAC said its October 2025 audit found the GH200 stack beat a prior FPGA submission by as much as 20% on the smallest model, 8% on medium, and 49% on largest. - The race is measured in microseconds, not 5 milliseconds: STAC and chip vendors are comparing inference stacks for co-located trading systems against FPGA baselines. (stacresearch.com)

In electronic trading, the clock starts when new market data arrives and stops when a model produces a signal. NVIDIA said on April 2 its GH200 system now does that in single-digit microseconds on a standard finance benchmark. (nvidia.com) The benchmark is STAC-ML Markets (Inference), run by STAC, an industry group that tests trading technology. It measures the delay between receiving new input and generating model output for long short-term memory, or LSTM, time-series models. (stacresearch.com) (nvidia.com) NVIDIA said the GH200 Grace Hopper Superchip in a Supermicro ARS-111GL-NHR server set record results in the Tacana suite, which uses a sliding window of updated market data. The company said the same work includes open-source reference code for low-latency inference. (nvidia.com) STAC published the audited comparison on October 13, 2025. It said the GH200 stack delivered up to 20% lower latency on the smallest model, 8% lower on the medium model, and 49% lower on the largest model than a previous FPGA submission. (stacresearch.com) That framing is different from the premise that capital-markets teams are merely pushing feature extraction below 5 milliseconds. The public benchmark and vendor material here are about full model inference latency measured in microseconds, a unit 1,000 times smaller than a millisecond. (stacresearch.com) (nvidia.com) Feature extraction is the prep step that turns raw order-book updates or trade ticks into numbers a model can use. STAC describes the target workload as real-time inference on event-driven financial data, which is the kind of pipeline firms run close to exchanges where tiny delays can change fill quality. (stacresearch.com 1) (stacresearch.com 2) FPGAs still anchor that market. AMD’s Alveo UL3524 trading card is marketed with less than 3-nanosecond transceiver latency and support for algorithmic trading, pre-trade risk analysis, market-data delivery, and low-latency AI models. (amd.com) NVIDIA’s argument is that general-purpose accelerators are catching specialized hardware on the jobs quants increasingly care about. Its April post says traders want deeper neural networks, while FPGAs and application-specific integrated circuits require more specialized engineering and investment. (nvidia.com) The industry’s own benchmark history shows the contest is moving fast. STAC’s working-group page lists new reports in April 2025, October 2025, and beyond as vendors test different model types and hardware stacks for finance workloads. (stacresearch.com) So the cleanest version of the story is narrower and faster than the prompt suggests: the latest documented milestone is not sub-5-millisecond feature extraction, but audited single-digit-microsecond inference for capital-markets models. (nvidia.com) (stacresearch.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.