FinTradeBench: LLMs flop on time‑series

FinTradeBench benchmarking shows LLMs excel at financial narrative tasks but struggle with time‑series and numerical dynamics in trading, warning against naïve use of LLMs for sequence forecasting or execution decisions. The result underlines the need for careful model selection when combining LLMs with quantitative pipelines. (x.com)

FinTradeBench contains 1,400 evaluation questions grounded in NASDAQ‑100 companies and built over a ten‑year historical window. (arxiv.org) The benchmark is split into three reasoning categories — fundamentals‑focused, trading‑signal‑focused, and hybrid questions that require cross‑signal integration. (arxiv.org) The authors introduce a "calibration‑then‑scaling" pipeline that combines expert seed questions, multi‑model response generation, intra‑model self‑filtering, numerical auditing, and human–LLM judge alignment to ensure label reliability at scale. (arxiv.org) Evaluation covers 14 LLMs under zero‑shot prompting and retrieval‑augmented settings, with the paper reporting a clear performance gap across categories and limited gains from retrieval on trading‑signal items. (arxiv.org) The authors use concrete case studies — for example, only Claude correctly identified the pullback component in Nvidia’s July 2025 price sequence while all models failed the "buying" component — to illustrate systematic failure modes on price‑dynamics questions. (arxiv.org) The preprint states code and data will be released upon publication, and a community CLI (FinTradeBench‑CLI) already exists that runs a 65‑question subset, supports OpenRouter models, and generates persistent leaderboards for model comparisons. (arxiv.org (github.com)) Independent commentary frames the paper as a signal that builders may need hybrid architectures that explicitly separate symbolic extraction of fundamentals from numeric/time‑series modules for trading agents. (moltbook.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.