Apr 27, 2025

Data as Capital in TAI Economies: A Biomorphic Framework

Isobel (Bella) Smith

This research introduces a novel theoretical framework for understanding data's role in Transformative Artificial Intelligence (TAI) economies. Traditional economic models inadequately capture data's unique properties in TAI systems: its non-rivalry, increasing returns, variable depreciation, and context-dependent value. We develop a "biomorphic framework" that conceptualizes data as both economic capital and a quasi-biological entity, drawing from economics, biology, quantum physics, and complex systems theory. Through case studies in financial markets and healthcare, we demonstrate the framework's explanatory power and derive policy implications for data governance that account for data's unique properties in advanced AI systems.

Reviewer's Comments

Reviewer's Comments

Arrow
Arrow
Arrow

An interesting discussion of the limits of standard economic assumptions when modelling data as a productive asset. Many of the individual points raised felt reasonable and correct. In addition, introducing case studies is a good way to make this feel more tangible. However, there didn’t seem to be a concrete alternative framework proposed, beyond gesturing towards taking insights from other fields. I definitely think there could be interesting research that comes from this direction, but this paper doesn’t suggest a concrete alternative approach. One way you could push this research forward instead is to adapt existing economic models using insights from this paper to guide changes to the standard assumptions (e.g. increasing returns to scale and non-rivalrous).

Some specific comments below.

Increasing returns is a good point, and a good reason for considering systems out of equilibrium. We could reframe this as long run/short run, where we look at SR equilibrium, transitions over time, then what happens in the LR either if all data is used, or synthetic data introduced.

I wouldn’t say ‘economic theory assumes…’, as economic theory refers to the tools. I would say ‘most of the literature considers models where…’. Then you’re arguing that we should use economic theory to consider different scenarios, i.e. non-rivalrous, increasing RTS, etc.

Often we might think about data as information, i.e. once you know it, you can’t get any further use from it. But might be interesting to think about data as a source of information, where the more you study it, the more information you can draw from it. (Then you could include a function which takes data as an input and produces value, then considering different functions allows you to explore the points you consider in ‘Variable and Context-Dependent Depreciation’ and elsewhere.)

‘Complex Adaptive System Dynamics’ – this section has interesting claims, I would be keen to see more discussion in this section.

It’s not true that economic theory doesn’t allow for inputs to be complements (capital and labour typically are). You could separate ‘data’ into different categories that are complements (so just a modelling choice). (re. Combinatorial value creation)

The section on scarcity seems right.

The submission offers an ambitious reimagining of data in Transformative AI economies, presenting a biomorphic framework that treats data as capital with quasi-biological traits. By drawing on biology, quantum physics, and complex systems theory, the paper highlights non-rivalry, combinatorial value creation, and disequilibrium dynamics that standard capital models overlook. The section on nuanced forms of data scarcity and the policy matrix that balances excludability with innovation are original contributions that help policymakers think beyond current data governance debates .

Despite this conceptual novelty, the argument relies mainly on metaphors and narrative case studies. There are no formal definitions, mathematical expressions, or simulation results that operationalize the proposed framework. As a result, it is difficult to test falsifiable claims or compare the approach with existing models of data value and access. The literature review lists relevant OECD and academic sources, but engagement is largely descriptive and omits recent quantitative work on data as an intangible asset, platform competition, and measurement of data externalities.

Links to AI safety are indirect. The paper notes that data governance shapes incentives for advanced AI systems, yet it stops short of analyzing how data capital affects alignment risks, model interpretability, or malicious use. Concrete pathways connecting the biomorphic perspective to safety mechanisms such as auditing, dataset provenance, or red-team incentives would improve the impact on the safety agenda.

Technical quality and documentation remain limited. The case studies in finance and healthcare are illustrative rather than empirical, and the text does not specify selection criteria, data sources, or analytical methods. No code, datasets, or reproducibility materials accompany the submission, which restricts external validation and future extensions by other researchers.

To strengthen the work, the author should:

1. Formalize key concepts with clear definitions and, if possible, simple models or agent-based simulations.

2. Provide at least one quantitative example that traces data accumulation, depreciation, and value creation under the biomorphic rules.

3. Expand the review to cover empirical studies on data markets and recent AI safety papers linking data governance to catastrophic risk reduction.

4. Release a minimal open-source notebook that reproduces any illustrative result or visual.

5. Clarify how the proposed governance mechanisms would mitigate specific safety risks, for example by preventing data poisoning or ensuring verifiable data lineage.

The biomorphic framing and incorporation of biological metaphors are creative. However, the argument would be stronger with a closer engagement with existing literature. There’s already a substantial body of work that explores data’s unique role in the modern economy—especially around platform capitalism and data as labor—that’s not fully acknowledged here. Several references are listed in the bibliography but are not clearly integrated into the main text, limiting their impact.

Discussions on the direct impact of AI risk reduction should be strengthened. Governance recommendations are offered, but they remain at a high level. A more detailed explanation of how the proposed policies would reduce specific risks, particularly eoconomic risks such as monopoly and data misuse on market dynamics, would enhance the paper.

This is really creative piece of work. Still, the framework remains mostly theoretical, and that limits how far the argument can go. The case studies in finance and healthcare are interesting, but they feel a bit too surface-level to really support the claims being made. Adding more detail, evidence, or even some light modeling would go a long way in making the framework feel more robust and ready for broader application.

An interesting perspective on how data is a unique type of good, and how its properties in a TAI economy differs from capital!

The primary strength lies in the observations made about how data is unique compared to capital, particularly the section on why traditional economic models fail. The statements are easy to understand, backed with good examples, and form a strong framework that is used throughout the report.

I also find the policy and governance implications interesting, although some more detail here would have been great – I would have recommended expanding this section at the expense of the metaphors described in section 3 (the biology analogy is sufficient, having the quantum example is not equally value-adding). The text would also have benefitted from being more consistent in what terms are used. For example, in the case studies, both "value superposition" and "multi-domain value" are used, and seem to mean similar things. Also, I am not sure if it is due to a tech issue, but I can't find the references being used in the text. This would have further strengthened the credibility of the analysis.

The relevance of the work is nevertheless high. Data will indeed be a cornerstone of a TAI economy, and it is essential to have a good economic model of it to develop effective governance, as the article states.

Cite this work

@misc {

title={

Data as Capital in TAI Economies: A Biomorphic Framework

},

author={

Isobel (Bella) Smith

},

date={

4/27/25

},

organization={Apart Research},

note={Research submission to the research sprint hosted by Apart.},

howpublished={https://apartresearch.com}

}

Recent Projects

Apr 27, 2026

OliGraph: graph-based screening of large oligopools

Existing synthesis screening tools cannot evaluate short oligonucleotide pools, whose overlapping fragments can be reassembled into regulated sequences via polymerase cycling assembly (PCA) yet fall below gene-length detection thresholds. We present OliGraph, an open-source tool that constructs a bi-directed overlap graph from an oligonucleotide pool and extracts contigs for downstream gene-length screening. An optional PCA mode retains only cross-strand overlaps consistent with PCA chemistry. We validated OliGraph in a blinded study across ten simulated pools (70–9,184 oligonucleotides, 30–300 bp) spanning four risk categories. BLAST screening of individual oligonucleotides failed to identify sequences of concern in most pools: three returned zero hits, and vector noise obscured true positives in the remainder. After OliGraph assembly, contig-level BLAST matched the longest assembled sequences (up to 1,905 bp) to sequences of concern at 97–100% identity. In one pool, assembly collapsed 1,634 individual BLAST results into 10 hits from a single contig, all assigned to the same source organism. PCA mode correctly distinguished assemblable from non-assemblable fragments within the same pool. Two pools with no assemblable structure yielded no contigs. OliGraph processed all pools in under 0.2 seconds, fast enough for real-time order screening and consistent with proposals to bring oligonucleotide orders within the scope of synthesis screening regulation.

Read More

Apr 27, 2026

BioRT-Bench: A Multi-Attack Red-Teaming Benchmark for Bio-Misuse Safeguards in Frontier LLMs

Frontier AI laboratories are expected to maintain safeguards against biological misuse, but whether deployed models actually refuse bio-misuse queries under adversarial pressure is largely unmeasured in the public literature. We introduce BioRT-Bench, a benchmark that runs four attack methods (direct request, PAIR, Crescendo, and base64 encoding) against four frontier models (Claude Sonnet 4.6, GPT-5.4, DeepSeek V4-flash, Kimi K2.5) across 40 prompts spanning five biosecurity-relevant categories. Responses are scored by a calibrated judge extending StrongREJECT with two bio-specific dimensions: specificity and actionability. We measure Attack Success Rate (ASR), where 0 means the model fully refused and 1 means it provided specific, actionable bio-misuse content. Our results reveal a sharp robustness divide: Chinese frontier models (DeepSeek, Kimi) have under 5% refusal rates even under direct request (ASR 0.88 and 0.79), while Western models (Claude, GPT) maintain substantially stronger safeguards (ASR 0.15 and 0.16). Crescendo is the most effective attack across all models, both in bypassing refusal and in eliciting actionable content. Claude Sonnet 4.6 is the most robust model tested, achieving 100% refusal against base64-encoded prompts.

Read More

Apr 27, 2026

PROTEUS (PROTein Evaluation for Unusual Sequences): Structure-Informed Safety Screening for de novo and Evasion-Prone Protein-Coding Sequences

AI protein design tools like RFdiffusion, ProteinMPNN, and Bindcraft make it trivial to produce low-homology sequences that fold into active, potentially hazardous architectures. However, sequence homology-based biosafety screening tools cannot detect proteins that pose functional risk through structurally novel mechanisms with no sequence precedent. We present a tiered computational pipeline that addresses this gap by combining MMseqs2 sequence alignment with structure-based comparison via FoldSeek and DALI against curated toxin databases totaling ~34,000 entries. AlphaFold2-predicted structures are screened for both global fold similarity (FoldSeek) and local active/allosteric site geometry (DALI), capturing convergent functional hazards that sequence screening misses. The pipeline was validated against a panel of toxins, benign proteins, structural mimics, and de novo-designed Munc13 binders, as well as modified ricin variants with residue substitutions. We additionally tested robustness to partial-synthesis evasion, where a bad actor submits multiple shorter coding sequences intended for downstream reassembly into a full toxin-coding gene. We found that while sequence-based screening did not identify any de novo ricin analogues with high certainty, the combined pipeline with FoldSeek and DALI identified all 24 tested de novo ricins as toxic.

Read More

Apr 27, 2026

OliGraph: graph-based screening of large oligopools

Existing synthesis screening tools cannot evaluate short oligonucleotide pools, whose overlapping fragments can be reassembled into regulated sequences via polymerase cycling assembly (PCA) yet fall below gene-length detection thresholds. We present OliGraph, an open-source tool that constructs a bi-directed overlap graph from an oligonucleotide pool and extracts contigs for downstream gene-length screening. An optional PCA mode retains only cross-strand overlaps consistent with PCA chemistry. We validated OliGraph in a blinded study across ten simulated pools (70–9,184 oligonucleotides, 30–300 bp) spanning four risk categories. BLAST screening of individual oligonucleotides failed to identify sequences of concern in most pools: three returned zero hits, and vector noise obscured true positives in the remainder. After OliGraph assembly, contig-level BLAST matched the longest assembled sequences (up to 1,905 bp) to sequences of concern at 97–100% identity. In one pool, assembly collapsed 1,634 individual BLAST results into 10 hits from a single contig, all assigned to the same source organism. PCA mode correctly distinguished assemblable from non-assemblable fragments within the same pool. Two pools with no assemblable structure yielded no contigs. OliGraph processed all pools in under 0.2 seconds, fast enough for real-time order screening and consistent with proposals to bring oligonucleotide orders within the scope of synthesis screening regulation.

Read More

Apr 27, 2026

BioRT-Bench: A Multi-Attack Red-Teaming Benchmark for Bio-Misuse Safeguards in Frontier LLMs

Frontier AI laboratories are expected to maintain safeguards against biological misuse, but whether deployed models actually refuse bio-misuse queries under adversarial pressure is largely unmeasured in the public literature. We introduce BioRT-Bench, a benchmark that runs four attack methods (direct request, PAIR, Crescendo, and base64 encoding) against four frontier models (Claude Sonnet 4.6, GPT-5.4, DeepSeek V4-flash, Kimi K2.5) across 40 prompts spanning five biosecurity-relevant categories. Responses are scored by a calibrated judge extending StrongREJECT with two bio-specific dimensions: specificity and actionability. We measure Attack Success Rate (ASR), where 0 means the model fully refused and 1 means it provided specific, actionable bio-misuse content. Our results reveal a sharp robustness divide: Chinese frontier models (DeepSeek, Kimi) have under 5% refusal rates even under direct request (ASR 0.88 and 0.79), while Western models (Claude, GPT) maintain substantially stronger safeguards (ASR 0.15 and 0.16). Crescendo is the most effective attack across all models, both in bypassing refusal and in eliciting actionable content. Claude Sonnet 4.6 is the most robust model tested, achieving 100% refusal against base64-encoded prompts.

Read More

This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.