Research: pricing agents and datasets

A new discussion highlights work from Anaxi Labs and Carnegie Mellon modeling how agents and datasets should be priced as business models shift toward usage and away from ads/subscriptions. (x.com) The coverage frames pricing as a core input to agent economics — deciding how compute, data and entitlements get allocated. (x.com)

Pricing is moving to the center of artificial intelligence economics, as Carnegie Mellon University and Anaxi Labs publish new research on how agents and datasets should be paid for. (cylab.cmu.edu) The collaboration was announced April 2, 2026, and focuses on two questions: how to price datasets and how generative products should make money as users shift from links to direct answers. Carnegie Mellon said the work is led by Chenyan Xiong of the Language Technologies Institute, with Anaxi Labs as the industry partner. (cylab.cmu.edu) Their first joint paper, posted to arXiv on March 30, 2026, is called “An Economic Framework for Generative Engines: Advertising or Subscription?” It models a market where systems such as ChatGPT and Google’s Artificial Intelligence Overviews answer queries directly, reducing traffic and advertising revenue for third-party sites. (arxiv.org) Before the pricing debate, the basic problem is simple: large language models need two costly inputs, computing power and training data. The Carnegie Mellon team’s earlier work treats data like a raw material whose price should reflect how much it improves model performance, not just how cheaply it can be bought. (arxiv.org) That earlier paper, “Fairshare Data Pricing via Data Valuation for Large Language Models,” was first submitted January 31, 2025 and later presented as a poster at the 2025 Conference on Neural Information Processing Systems. It argues that low or exploitative prices drive high-quality data sellers out of the market and reduce model quality over time. (arxiv.org) (neurips.cc) The new paper extends that logic from datasets to the products built on top of them. Its abstract says generative engines face a monetization choice between inserting ads into synthesized answers or keeping responses ad-free to increase subscription conversions over time. (arxiv.org) Anaxi Labs framed the shift more broadly: “agents call other agents” and models rely on curated datasets, so each output reflects work from multiple contributors. In that setup, pricing is not just a billing decision; it determines how revenue gets split across model builders, data suppliers and agent operators. (accessnewswire.com) Carnegie Mellon’s news release makes the same point in plainer terms, saying researchers are studying how the value created by artificial intelligence systems should be “measured and distributed” across the ecosystem. Xiong said the question is where revenue comes from in an “AI-native format” and how that revenue gets distributed. (cylab.cmu.edu) There is already debate over whether data valuation methods make good prices. A separate April 2025 paper, “Do Data Valuations Make Good Data Prices?,” argues that popular methods such as Leave-One-Out and Data Shapley can produce inefficient market outcomes when payments are supposed to compensate data owners. (arxiv.org) That leaves the field in an early stage: one camp is building pricing rules from measured contribution to model quality, while critics say contribution scores alone do not create a workable market. The Carnegie Mellon and Anaxi Labs work puts that dispute closer to the center of how agents, datasets and generative products may be sold. (arxiv.org 1) (arxiv.org 2)

Research: pricing agents and datasets

Get your own daily briefing