Apr 28, 2025

Redistributing the AI Dividend: Modeling Data as Labor in a Transformative Economy

Sneha Maria Rozario, Srishti Dutta,

The economic transformations induced by artificial intelligence (AI) raise pressing distributional concerns. This paper examines the allocation of the AI Dividend—the surplus value generated through AI advancements—and proposes mechanisms to ensure equitable redistribution. Anchoring our analysis in the Distribution track, we focus on the Data as Labor (DaL) framework, inspired by Lanier and Weyl, wherein individuals' data contributions are treated as productive labor. We simulate and compare two paradigms: Data as Capital (DaC), in which data is aggregated as corporate capital, and DaL, wherein individuals are compensated for their data. Using comparative economic simulations, we highlight systemic inequalities emerging under DaC and demonstrate the stabilizing potential of DaL structures. We further subdivide the DaL paradigm into two mechanisms: a corporate taxation regime and an individualized data compensation model, proposing a novel formula for micro-level redistribution. The implications of these models for labor markets, inequality, and societal stability are discussed, with a focus on designing incentive-compatible and scalable economic policies. Our findings suggest that recognizing data as labor not only promotes distributive justice but also enhances the long-term sustainability of the AI-driven economy.Message @vi

Reviewer's Comments

Reviewer's Comments

Arrow
Arrow
Arrow
Arrow
Arrow

This felt like a useful and ambitious project which builds on existing work in the literature. This kind of work has clear policy impact through evaluating methods of redistribution (which could be implemented by governments). When building on this work, I would like to see better explanation of modelling methods and assumptions and a comparison with existing methods of redistribution (it wasn't clear to me how much this was considered in the DaC scenario).

How does the Collective Data Dividend Model have different impacts in practice compared to more standard methods of economic redistribution, namely corporation taxation?

The FEV equation/Table 1 was a little hard to read; I found the version in the appendix with actual numbers very useful for interpreting it. It would have been better to have a some discussion talking through the formula (e.g. the first half vs second half).

I thought the questions you try to answer in Section 4 (plus additional graphs in the appendix) are absolutely the right questions to be asking. It would be better to have an explanation of how you generated these graphs and what assumptions you needed to use (even if just a sketch). Similarly, numbers in table on page 7 were useful but please give sources. In particular, I would like to see what assumptions you use about the population (e.g. what percentage of the population owns shares in AI companies). I would also compare to a standard method of redistribution like existing (or increased) taxation.

Section 5 and 6 sketched some really useful ideas. In particular, the last paragraph of section 6 highlight some important considerations which would be worth investigating further.

Other considerations might include: how do you consider data from people in other countries vs. cross country coordination (this has implications for wealth transfer/inequality between the US and other parts of the world). Should quality of data matter, e.g. forum posts (which are easy to automate and generates incentives to do so) vs. published novels or papers. You mention privacy - should individuals be able to opt out, and how onerous would this be to implement?

The paper tackles the pressing question of how to share the surplus from advanced AI by contrasting two ownership regimes for data: data as capital, where firms capture most value, and data as labor, where individuals receive compensation. The authors adapt the Lanier and Weyl vision into two concrete redistribution mechanisms, a collective data dividend and an individualized Future Economic Value formula, then run synthetic twenty-year simulations in R. The projected Gini trajectories on page 3 and the top ten percent wealth share plot on page 5 visually convey the distributional gap between the two regimes. The work adds to the sprint by moving beyond abstract discussion and offering parameterized formulas that policymakers could in principle legislate.

Yet the contribution remains largely conceptual. All numeric inputs are hypothetical and chosen for illustrative purposes, so the headline result that a California resident might earn 1 538 dollars a year from her data, shown in the worked example on page 7, has no empirical grounding. Key parameters such as the data labor multiplier, the allocation ratio, and the displacement rate are asserted rather than calibrated to existing studies on data markets, big-tech margins, or labor share trends. The Future Economic Value equation on page 2 combines macro aggregates and micro weights in a way that mixes units and double counts growth effects; no sensitivity analysis or code repository is provided to let readers test robustness. The literature review cites only two sources and omits the fast-growing empirical and legal scholarship on data trusts, data unions, and digital public infrastructure, limiting the theoretical foundation.

AI safety links are mentioned but thin. The authors claim that fair compensation stabilizes society and thus lowers systemic risk, yet they do not trace specific pathways from redistribution to alignment incentives, compute governance, or reduction in catastrophic misuse. Including a discussion of how data labor contracts could improve dataset provenance, reduce poisoning incentives, or finance safety research would make the impact clearer.

Technical documentation is partial. Figures are clear but the appendix supplies only a plotting link, not the full simulation script, and several symbols in Table 1 are undefined in the text. Without public code or real data the study is not reproducible and policymakers cannot assess fiscal feasibility.

Future work should calibrate parameters to real balance-sheet and household survey data, run Monte Carlo sensitivity tests, benchmark the formulas against existing wealth and carbon dividend schemes, and integrate safety-specific channels such as red-team bounties funded by the dividend pool. A richer literature review and full code release would substantially raise both credibility and reuse potential.

The paper pushes forward the discussion on the policy implications of significant worker displacement by AI. The paper highlights the role of data and humans' role in generating (valuable) data, which is a distinct approach relative to alternative methods to guarantee income such as UBI. The comparison of DaK and DaL paradigms is useful. Also, the paper pushes even further by discussing concrete policy proposals. I would like to appreciate the authors' efforts on these and it would be great to see them make further progress along these lines.

It is worth mentioning that the paper assumes human-generated data will remain as a valuable input to AI development. However, it is also possible that algorithms become so advanced that they don't require large amounts of data. While the authors acknowledge this possibility, the analysis could be expanded by taking into account how algorithmic improvements allow increasingly data-efficient development of AI.

Also, it may be useful to compare the dividends with UBI. Should everyone receive the same amount of dividends? Or should the distribution depend on individual characteristics such as the quantity or quality of data generated?

Cite this work

@misc {

title={

Redistributing the AI Dividend: Modeling Data as Labor in a Transformative Economy

},

author={

Sneha Maria Rozario, Srishti Dutta,

},

date={

4/28/25

},

organization={Apart Research},

note={Research submission to the research sprint hosted by Apart.},

howpublished={https://apartresearch.com}

}

Recent Projects

Jan 11, 2026

Eliciting Deception on Generative Search Engines

Large language models (LLMs) with web browsing capabilities are vulnerable to adversarial content injection—where malicious actors embed deceptive claims in web pages to manipulate model outputs. We investigate whether frontier LLMs can be deceived into providing incorrect product recommendations when exposed to adversarial pages.

We evaluate four OpenAI models (gpt-4.1-mini, gpt-4.1, gpt-5-nano, gpt-5-mini) across 30 comparison questions spanning 10 product categories, comparing responses between baseline (truthful) and adversarial (injected) conditions. Our results reveal significant variation: gpt-4.1-mini showed 45.5% deception rate, while gpt-4.1 demonstrated complete resistance. Even frontier gpt-5 models exhibited non-zero deception rates (3.3–7.1%), confirming that adversarial injection remains effective against current models.

These findings underscore the need for robust defenses before deploying LLMs in high-stakes recommendation contexts.

Read More

Jan 11, 2026

SycophantSee - Activation-based diagnostics for prompt engineering: monitoring sycophancy at prompt and generation time

Activation monitoring reveals that prompt framing affects a model's internal state before generation begins.

Read More

Jan 11, 2026

Who Does Your AI Serve? Manipulation By and Of AI Assistants

AI assistants can be both instruments and targets of manipulation. In our project, we investigated both directions across three studies.

AI as Instrument: Operators can instruct AI to prioritise their interests at the expense of users. We found models comply with such instructions 8–52% of the time (Study 1, 12 models, 22 scenarios). In a controlled experiment with 80 human participants, an upselling AI reliably withheld cheaper alternatives from users - not once recommending the cheapest product when explicitly asked - and ~one third of participants failed to detect the manipulation (Study 2).

AI as Target: Users can attempt to manipulate AI into bypassing safety guidelines through psychological tactics. Resistance varied dramatically - from 40% (Mistral Large 3) to 99% (Claude 4.5 Opus) - with strategic deception and boundary erosion proving most effective (Study 3, 153 scenarios, AI judge validated against human raters r=0.83).

Our key finding was that model selection matters significantly in both settings. We learned some models complied with manipulative requests at much higher rates. And we found some models readily follow operator instructions that come at the user's expense - highlighting a tension for model developers between serving paying operators and protecting end users.

Read More

This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.