Agentic AI produced measurable alpha
A new SSRN paper and associated open source work claim agentic LLMs can autonomously generate tradable signals with strong returns, suggesting a real experimental path for LLM‑driven quant research. The paper reports a Sharpe of 2.75 and 54.8% returns using a simple linear‑combination framework, and the TradingAgents project provides a multi‑agent, committee‑style system for blending news, technicals and sentiment in research experiments. Those results make agentic LLMs worth controlled R&D for signal discovery, but they also prompted calls for standards to govern agentic risk when models act on trades. ( )
A stock trader usually starts with a signal, which is a simple rule that turns messy market data into one instruction like buy, sell, or wait. A moving average is one old example: if price stays above its recent average, the rule leans bullish. (ssrn.com) Large language models changed that by reading text instead of just prices. A 2023 paper from Alejandro Lopez-Lira and Yuehua Tang found that GPT-4 could read news headlines and predict stock reactions well enough to identify subsequent drift, especially in small stocks and negative news. (ssrn.com) The new jump is from a chatbot answering one prompt to an agent doing a whole workflow. A 2026 survey on the Social Science Research Network says “agentic” finance systems add perception, planning, memory, tool use, and self-improvement, which means the model is no longer just labeling text but running a process. (ssrn.com) That matters because a trading desk is already a chain of jobs. One person reads filings, one watches news, one studies charts, and one cuts risk when positions get too large. (arxiv.org) TradingAgents is an open-source attempt to copy that structure with software. Its GitHub page says the system uses separate large-language-model agents for fundamentals, sentiment, news, trading, and risk management, then lets them debate before a final decision. (github.com) The research paper behind TradingAgents was first posted to arXiv in December 2024 and updated through June 3, 2025. Its abstract says the multi-agent setup beat baseline models on cumulative return, Sharpe ratio, and maximum drawdown in its experiments. (arxiv.org) The eye-catching claim in the newer discussion around this field is that agentic systems may be finding tradable alpha, which is finance shorthand for returns above a plain market benchmark after adjusting for risk. The figures circulating with the new Social Science Research Network paper are a Sharpe ratio of 2.75 and returns of 54.8% from a simple linear-combination setup, which is why people who usually ignore artificial-intelligence demos are paying attention. (ssrn.com) A Sharpe ratio is just return divided by how bumpy the ride was. A strategy with a high Sharpe ratio did not just make money; it made money with fewer violent swings per unit of risk than most noisy trading systems. (aqr.com) The catch is that finance papers can look brilliant in backtests and fall apart in live markets. Recent papers on “profit mirage” and “time travel” warn that large language model trading systems can accidentally benefit from information leakage when the model has already absorbed future facts during training. (arxiv.org 1) (arxiv.org 2) That is why the open-source part may matter as much as the return number. A public framework lets other researchers rerun the tests, swap models, add transaction costs, tighten time windows, and see whether the signal survives outside one team’s setup. (github.com) (arxiv.org) The governance argument showed up almost immediately because an agent that writes research is one thing and an agent that places orders is another. The National Institute of Standards and Technology launched an Artificial Intelligence Agent Standards Initiative in February 2026 to work on identity, security, interoperability, and evaluation for agents that act on behalf of users. (nist.gov) Finance has started sketching its own guardrails too. A February 2026 risk-management profile from the University of California, Berkeley’s Center for Long-Term Cybersecurity calls for documented accountability, risk mapping, and communication practices for agentic systems, while a separate proposal for an APEX protocol tries to define how trading agents would talk to brokers and exchanges with built-in safety controls. (cltc.berkeley.edu) (apexstandard.org) So the story is not that machines suddenly solved the market. The story is that large language models have moved from reading headlines to running small research organizations in software, and the first reported numbers are now strong enough that serious quant teams will test them under tighter controls instead of dismissing them as another chatbot stunt. (ssrn.com 1) (ssrn.com 2) (github.com)