Apr 26, 2026
Automated Causal Graph Extraction and Value-of-Information Prioritization for AI Biorisk Modelling
Douw Marx
Quantifying risk at scale requires defining and prioritizing hundreds of causal paths to harm. I present an automated pipeline that (i) extracts causal chains from a single source document with an LLM, (ii) collapses near-duplicate nodes using embeddings and paired merge-proposer / merge-validator LLMs, and (iii) elicits Beta and PERT priors per node. Nodes in the risk model are then ranked by betweenness centrality, Birnbaum importance, and Expected Value of Partial Perfect Information (EVPPI), after Monte Carlo sampling. The pipeline is demonstrated on biorisk and serves as a proof-of-concept that LLMs can prioritize risk and indicate where new evaluations would most change downstream decisions.
- A limitation for discussion is that biorisk modeling can be significantly based on classified information or infohazards, which can reduce the overall applicability.
- Expanding towards taking in multiple sources. In the real world, multiple sources would inform overall evaluation priorities. I was not quite sure from the methodology if that'd be possible.
- Explain why evaluation prioritization is a current bottleneck in biorisk reduction.
- A paragraph on how plausible your results were would have been helpful. Do the recommendations broadly align with overall evaluation priorities? You could discuss how human graders would judge the outputs of the automated scoring and the plausibility.
- A control condition where you compare this method to asking an AI Agent directly to extract the information and differences in outcome would have been interesting.
The problem area is neglected - we need more rigorous tooling for intervention prioritization in AIxBio, and this work addresses that gap. However, the intended audience and theory of change are unclear from the paper; there are some mentions of it in the limitations and future work section, but it should be clear for the beginning who this tool is directed at, how it should be used, what problem it solves and what the downstream implications are.
“Granularity” is undefined (although the author notes this explicitly, which is appreciated). The paper would benefit from an operational definition and a sensitivity analysis. Using LLMs to elicit priors is risky, since LLMs are often overconfident and can produce numbers detached from reality, although I did not read the AutoEllicit paper and I can see how this is a necessary choice given the scope of the project. However, Gemini 2.5 Flash Lite is a weak choice for this, since it’s a step that is genuinely difficult for LLMs, which is why a frontier model (or several) should be used - I understand the budget constraints, but the paper would benefit from an acknowledgement here.
Connected to that, I have a problem with how P(X | any parent active) is a single Beta but the actual conditional probabilities could vary substantially across the parents.. Either model parent-specific conditionals, keep the structure as a forest, or argue why the merged single-parameter approximation is acceptable. As written, the merge step invalidates the elicitation model. This is the most serious methodological issue I have with the paper.
Standard betweenness sums over all s ≠ t, but in a risk DAG, only source-to-outcome paths are semantically meaningful. A source-to-outcome restricted betweenness (or flow betweenness) would be more appropriate.
Notation is not properly introduced in the paper, which makes it hard to read and sometimes confusing, especially with some indexes colliding.
Figure 2 is unreadable - the caption says "zoom for node labels", but those labels are unreadable at any zoom level. This is the only chance the reader has to see what the graph contains, especially with the pipeline being built in a way that does not allow easy replication.
Cite this work
@misc {
title={
(HckPrj) Automated Causal Graph Extraction and Value-of-Information Prioritization for AI Biorisk Modelling
},
author={
Douw Marx
},
date={
4/26/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


