Foresight hits 95% simulation accuracy
- On May 10, 2026, Y Combinator highlighted Foresight, a Summer 2025 startup, saying its consumer-behavior simulations reached 95% accuracy in blind benchmarking. - The key figure was 0.88 Lin’s concordance correlation coefficient, which Y Combinator said was at the statistical threshold for method replacement. - By May 30, Foresight said the first month is free on signed three-month pilots, while Anthropic hosts Pendo’s Novus case study.
Y Combinator used its Launches platform on May 10 to spotlight Foresight, a Summer 2025 startup that says it can simulate how consumers will react to product and marketing decisions before companies run them in the real world. In the post, Foresight said a blind benchmark with a Fortune 500 company covering more than 100 paired estimates showed 95% accuracy against fieldwork and a 0.88 Lin’s concordance correlation coefficient. Anthropic, in a separate customer case study published on its Claude site, said Pendo used Claude Managed Agents and the Claude Agent SDK to build Novus, a system that detects usability issues in customer applications and suggests fixes. Pendo Chief AI Officer Zain Lakhani said the product is meant to help teams keep shipping quickly while shortening the feedback loop after software reaches users. ### What exactly did Foresight claim it had measured? (ycombinator.com) Foresight said its benchmark was run with a Fortune 500 company and compared its simulation outputs with traditional fieldwork across more than 100 paired estimates. The company said the result was 95% accuracy against fieldwork. The same post cited a 0.88 Lin’s concordance correlation coefficient, which Foresight described as being at the statistical threshold for method replacement. (claude.com) Foresight also said clients use the product to test reactions to launches and campaigns in minutes and that customers make decisions with 25 times more consumer evidence on average without increasing costs. (ycombinator.com) ### Who is behind Foresight, and where was the claim published? Y Combinator’s Launches page identified Foresight as a Summer 2025 company in the B2B category with tags including artificial intelligence, consumer, market research and consumer products. The post was signed by Antoine and Eytan, and linked to the company’s site. The Launches entry framed the product as “AI-powered simulations of human behavior” for marketers, advertisers and insights teams across Fortune 500, retail, FMCG and technology companies. (ycombinator.com) It also included a commercial offer: a free audit for the first 10 companies that reply and a free first month on any three-month pilot signed before May 30. ### What did Pendo say Claude is doing inside its workflow? Anthropic’s customer story said Pendo has been building Novus over the past few months as a product that detects and fixes usability issues in customer applications. The case study said the system runs on Claude’s Agent SDK and is deployed with Claude Managed Agents. Zain Lakhani said Novus is designed to examine what is happening inside an application, cross-reference that with the codebase and suggest fixes. (ycombinator.com) In one example, he said the system can identify drop-off in a funnel, inspect the related code and recommend a change such as moving a button that sits below the fold. ### Why is Pendo describing this as a speed problem? Lakhani said Pendo’s users “used to be product managers looking at dashboards” and are now “product engineers shipping code,” changing what the company needed to build. (claude.com) He said developers using AI coding tools are shipping faster, often with less user acceptance testing than before, creating a gap between release velocity and product feedback. Pendo said Novus is intended to close that gap after release rather than slow teams down before shipping. (claude.com) Lakhani said the goal is to see how users respond and optimize “within minutes,” while Anthropic said Pendo reached a 90% success rate on PM-reviewed evaluation sets for Claude-powered agent tasks and built the product in three months. ### What do these two examples show about how teams are using models? (claude.com) Foresight’s pitch centers on using models before launch to simulate likely customer reactions, while Pendo’s case study describes using models after launch to identify usability problems and generate fixes. That sequence — pre-launch prediction on one side, post-launch QA and remediation on the other — is an inference from the two companies’ published descriptions. (claude.com) The next concrete milestones are already listed. Foresight said its promotional pilot offer runs through May 30, and Anthropic continues to host the Pendo Novus case study on its customer stories page alongside other Claude deployments. (ycombinator.com)