Apr 28, 2025

Redistributing the AI Dividend: Modeling Data as Labor in a Transformative Economy

Sneha Maria Rozario, Srishti Dutta,

The economic transformations induced by artificial intelligence (AI) raise pressing distributional concerns. This paper examines the allocation of the AI Dividend—the surplus value generated through AI advancements—and proposes mechanisms to ensure equitable redistribution. Anchoring our analysis in the Distribution track, we focus on the Data as Labor (DaL) framework, inspired by Lanier and Weyl, wherein individuals' data contributions are treated as productive labor. We simulate and compare two paradigms: Data as Capital (DaC), in which data is aggregated as corporate capital, and DaL, wherein individuals are compensated for their data. Using comparative economic simulations, we highlight systemic inequalities emerging under DaC and demonstrate the stabilizing potential of DaL structures. We further subdivide the DaL paradigm into two mechanisms: a corporate taxation regime and an individualized data compensation model, proposing a novel formula for micro-level redistribution. The implications of these models for labor markets, inequality, and societal stability are discussed, with a focus on designing incentive-compatible and scalable economic policies. Our findings suggest that recognizing data as labor not only promotes distributive justice but also enhances the long-term sustainability of the AI-driven economy.Message @vi

Download

Review Project

View Presentation

View Related Sprint

Reviewer's Comments

Cecilia Wood

This felt like a useful and ambitious project which builds on existing work in the literature. This kind of work has clear policy impact through evaluating methods of redistribution (which could be implemented by governments). When building on this work, I would like to see better explanation of modelling methods and assumptions and a comparison with existing methods of redistribution (it wasn't clear to me how much this was considered in the DaC scenario).

How does the Collective Data Dividend Model have different impacts in practice compared to more standard methods of economic redistribution, namely corporation taxation?

The FEV equation/Table 1 was a little hard to read; I found the version in the appendix with actual numbers very useful for interpreting it. It would have been better to have a some discussion talking through the formula (e.g. the first half vs second half).

I thought the questions you try to answer in Section 4 (plus additional graphs in the appendix) are absolutely the right questions to be asking. It would be better to have an explanation of how you generated these graphs and what assumptions you needed to use (even if just a sketch). Similarly, numbers in table on page 7 were useful but please give sources. In particular, I would like to see what assumptions you use about the population (e.g. what percentage of the population owns shares in AI companies). I would also compare to a standard method of redistribution like existing (or increased) taxation.

Section 5 and 6 sketched some really useful ideas. In particular, the last paragraph of section 6 highlight some important considerations which would be worth investigating further.

Other considerations might include: how do you consider data from people in other countries vs. cross country coordination (this has implications for wealth transfer/inequality between the US and other parts of the world). Should quality of data matter, e.g. forum posts (which are easy to automate and generates incentives to do so) vs. published novels or papers. You mention privacy - should individuals be able to opt out, and how onerous would this be to implement?

Joel Christoph

The paper tackles the pressing question of how to share the surplus from advanced AI by contrasting two ownership regimes for data: data as capital, where firms capture most value, and data as labor, where individuals receive compensation. The authors adapt the Lanier and Weyl vision into two concrete redistribution mechanisms, a collective data dividend and an individualized Future Economic Value formula, then run synthetic twenty-year simulations in R. The projected Gini trajectories on page 3 and the top ten percent wealth share plot on page 5 visually convey the distributional gap between the two regimes. The work adds to the sprint by moving beyond abstract discussion and offering parameterized formulas that policymakers could in principle legislate.

Yet the contribution remains largely conceptual. All numeric inputs are hypothetical and chosen for illustrative purposes, so the headline result that a California resident might earn 1 538 dollars a year from her data, shown in the worked example on page 7, has no empirical grounding. Key parameters such as the data labor multiplier, the allocation ratio, and the displacement rate are asserted rather than calibrated to existing studies on data markets, big-tech margins, or labor share trends. The Future Economic Value equation on page 2 combines macro aggregates and micro weights in a way that mixes units and double counts growth effects; no sensitivity analysis or code repository is provided to let readers test robustness. The literature review cites only two sources and omits the fast-growing empirical and legal scholarship on data trusts, data unions, and digital public infrastructure, limiting the theoretical foundation.

AI safety links are mentioned but thin. The authors claim that fair compensation stabilizes society and thus lowers systemic risk, yet they do not trace specific pathways from redistribution to alignment incentives, compute governance, or reduction in catastrophic misuse. Including a discussion of how data labor contracts could improve dataset provenance, reduce poisoning incentives, or finance safety research would make the impact clearer.

Technical documentation is partial. Figures are clear but the appendix supplies only a plotting link, not the full simulation script, and several symbols in Table 1 are undefined in the text. Without public code or real data the study is not reproducible and policymakers cannot assess fiscal feasibility.

Future work should calibrate parameters to real balance-sheet and household survey data, run Monte Carlo sensitivity tests, benchmark the formulas against existing wealth and carbon dividend schemes, and integrate safety-specific channels such as red-team bounties funded by the dividend pool. A richer literature review and full code release would substantially raise both credibility and reuse potential.

Donghyun Suh

The paper pushes forward the discussion on the policy implications of significant worker displacement by AI. The paper highlights the role of data and humans' role in generating (valuable) data, which is a distinct approach relative to alternative methods to guarantee income such as UBI. The comparison of DaK and DaL paradigms is useful. Also, the paper pushes even further by discussing concrete policy proposals. I would like to appreciate the authors' efforts on these and it would be great to see them make further progress along these lines.

It is worth mentioning that the paper assumes human-generated data will remain as a valuable input to AI development. However, it is also possible that algorithms become so advanced that they don't require large amounts of data. While the authors acknowledge this possibility, the analysis could be expanded by taking into account how algorithmic improvements allow increasingly data-efficient development of AI.

Also, it may be useful to compare the dividends with UBI. Should everyone receive the same amount of dividends? Or should the distribution depend on individual characteristics such as the quantity or quality of data generated?

Cite this work

@misc {

title={

Redistributing the AI Dividend: Modeling Data as Labor in a Transformative Economy

author={

Sneha Maria Rozario, Srishti Dutta,

date={

4/28/25

organization={Apart Research},

note={Research submission to the research sprint hosted by Apart.},

howpublished={https://apartresearch.com}

}

Recent Projects

View All

Jul 28, 2025

Local Learning Coefficients Predict Developmental Milestones During Group Relative Policy Optimization

In this work, we investigate the emergence of capabilities in reinforcement learning (RL) by framing them as developmental phase transitions. We propose that the individual components of the reward function can serve as direct observables for these transitions, avoiding the need for complex, derived metrics. To test this, we trained a language model on an arithmetic task using Group Relative Policy Optimization (GRPO) and analyzed its learning trajectory with the Local Learning Coefficient (LLC) from Singular Learning Theory. Our findings show a strong qualitative correlation between spikes in the LLC—indicating a phase transition—and significant shifts in the model's behavior, as reflected by changes in specific reward components for correctness and conciseness. This demonstrates a more direct and scalable method for monitoring capability acquisition, offering a valuable proof-of-concept for developmental interpretability and AI safety. To facilitate reproducibility, we make our code available at \url{github.com/ilijalichkovski/apart-physics}.

Jul 28, 2025

AI agentic system epidemiology

As AI systems scale into decentralized, multi-agent deployments, emergent vulnerabilities challenge our ability to evaluate and manage systemic risks.

In this work, we adapt classical epidemiological modeling (specifically SEIR compartment models) to model adversarial behavior propagation in AI agents.

By solving systems of ODEs describing the systems with physics-informed neural networks (PINNs), we analyze stable and unstable equilibria, bifurcation points, and the effectiveness of interventions.

We estimate parameters from real-world data (e.g., adversarial success rates, detection latency, patching delays) and simulate attack propagation scenarios across 8 sectors (enterprise, retail, trading, development, customer service, academia, medical, and critical infrastructure AI tools).

Our results demonstrate how agent population dynamics interact with architectural and policy design interventions to stabilize the system.

This framework bridges concepts from dynamical systems and cybersecurity to offer a proactive, quantitative toolbox on AI safety.

We argue that epidemic-style monitoring and tools grounded in interpretable, physics-aligned dynamics can serve as early warning systems for cascading AI agentic failures.

Jul 28, 2025

Momentum–Point-Perplexity Mechanics in Large Language Models

This work analyzes the hidden states of twenty different open-source transformer language models, ranging from small to medium size and covering five major architectures. The key discovery is that these models show signs of "energy conservation" during inference—meaning a certain measure combining changes in hidden states and token unpredictability stays almost constant as the model processes text.

The authors developed a new framework inspired by physics to jointly analyze how hidden states and prediction confidence evolve over time. They propose that transformers' behavior can be understood as following certain mechanical principles, much like how physical systems follow rules like conservation of energy.

Their experiments show that this conserved quantity varies very little between tokens, especially in untrained (random-weight) models, where it's extremely stable. In pre-trained models, the average energy drops more due to training, but there are larger relative fluctuations from token to token.

They also introduce a new method, based on this framework, for controlling transformer outputs by "steering" the hidden states. This method achieves good results—producing completions rated as higher in semantic quality, while still maintaining the same kind of energy stability.

Overall, the findings suggest that viewing transformer models through the lens of physical mechanics gives new, principled ways to interpret and control their behavior. It also highlights a key difference: random models behave more like balanced systems, while trained models make quicker, more decisive state changes at the cost of less precise energy conservation.

Jul 28, 2025

Local Learning Coefficients Predict Developmental Milestones During Group Relative Policy Optimization

Jul 28, 2025

AI agentic system epidemiology

As AI systems scale into decentralized, multi-agent deployments, emergent vulnerabilities challenge our ability to evaluate and manage systemic risks.

In this work, we adapt classical epidemiological modeling (specifically SEIR compartment models) to model adversarial behavior propagation in AI agents.

By solving systems of ODEs describing the systems with physics-informed neural networks (PINNs), we analyze stable and unstable equilibria, bifurcation points, and the effectiveness of interventions.

Our results demonstrate how agent population dynamics interact with architectural and policy design interventions to stabilize the system.

This framework bridges concepts from dynamical systems and cybersecurity to offer a proactive, quantitative toolbox on AI safety.

We argue that epidemic-style monitoring and tools grounded in interpretable, physics-aligned dynamics can serve as early warning systems for cascading AI agentic failures.

Jul 28, 2025

Local Learning Coefficients Predict Developmental Milestones During Group Relative Policy Optimization

Jul 28, 2025

AI agentic system epidemiology

As AI systems scale into decentralized, multi-agent deployments, emergent vulnerabilities challenge our ability to evaluate and manage systemic risks.

In this work, we adapt classical epidemiological modeling (specifically SEIR compartment models) to model adversarial behavior propagation in AI agents.

By solving systems of ODEs describing the systems with physics-informed neural networks (PINNs), we analyze stable and unstable equilibria, bifurcation points, and the effectiveness of interventions.

Our results demonstrate how agent population dynamics interact with architectural and policy design interventions to stabilize the system.

This framework bridges concepts from dynamical systems and cybersecurity to offer a proactive, quantitative toolbox on AI safety.

We argue that epidemic-style monitoring and tools grounded in interpretable, physics-aligned dynamics can serve as early warning systems for cascading AI agentic failures.

Jul 28, 2025

Local Learning Coefficients Predict Developmental Milestones During Group Relative Policy Optimization

Jul 28, 2025

AI agentic system epidemiology

As AI systems scale into decentralized, multi-agent deployments, emergent vulnerabilities challenge our ability to evaluate and manage systemic risks.

In this work, we adapt classical epidemiological modeling (specifically SEIR compartment models) to model adversarial behavior propagation in AI agents.

By solving systems of ODEs describing the systems with physics-informed neural networks (PINNs), we analyze stable and unstable equilibria, bifurcation points, and the effectiveness of interventions.

Our results demonstrate how agent population dynamics interact with architectural and policy design interventions to stabilize the system.

This framework bridges concepts from dynamical systems and cybersecurity to offer a proactive, quantitative toolbox on AI safety.

We argue that epidemic-style monitoring and tools grounded in interpretable, physics-aligned dynamics can serve as early warning systems for cascading AI agentic failures.

This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.

Careers

AI Safety Ideas

Code of Conduct

Responsible disclosure policy

hello@apartresearch.com