The paper presents an inventive empirical pipeline that matches three very different datasets: four million Claude Sonnet 3.7 prompts mapped to ONET tasks, a crosswalk from ONET to Brazil’s CBO occupation codes, and monthly employment flows from the CAGED register from 2021 to mid-2024. By grouping occupations into four exposure buckets and running seasonal and trend adjustments followed by simple autoregressive tests, the authors find no statistically significant divergence in net job creation between high and low prompt-exposed occupations
. Releasing the code and provisional crosswalk on GitHub is commendable, and the discussion section openly lists the main data and classification shortcomings. The study is a useful proof of concept for real-time labour-market monitoring in developing economies.
Innovation and literature depth are moderate. Linking real LLM usage to national employment data is a novel empirical step, but the conceptual framing relies mainly on Acemoglu and Restrepo’s task model and a single recent Anthropic paper. The review omits earlier occupation-level exposure measures and does not engage Brazilian labour-market studies, limiting its foundation.
The AI safety contribution is indirect. Monitoring displacement can inform distributional policy, yet the paper does not connect its findings to systemic safety issues such as social instability, race dynamics, or governance incentives that affect catastrophic risk. Adding a pathway from timely displacement signals to alignment or compute governance decisions would improve relevance.
Technical execution is mixed. Strengths include careful seasonality removal and candid presentation of ADF statistics. Weaknesses include heavy dependence on one week of prompt data, unverified LLM-generated crosswalks, absence of robustness checks, and small simulation sample size (five runs per scenario). Parameter choices for the AR models and lag selection are not justified, and no confidence bands are shown on the plots on pages 6 and 7. Without formal hypothesis tests comparing the four series, the “no difference” conclusion is tentative.
Suggestions for improvement
1. Expand the Anthropic dataset to multiple models and longer time windows, then rerun the analysis with rolling windows and placebo occupations.
2. Replace the LLM crosswalk with expert-validated mappings and report a sensitivity study to mapping uncertainty.
3. Use difference-in-differences or panel regressions with occupation fixed effects to test for differential shocks rather than relying on visual inspection and ADF tests.
4. Integrate policy scenarios that link early displacement signals to safety-relevant interventions such as workforce transition funds financed by windfall clauses.
5. Broaden the literature review to include empirical UBI pilots, Brazilian automation studies, and recent AI safety economics papers.