Impact of generative AI on tobacco, investment and tourism industry

jalokim

I examine how introducing Gen AI solutions in companies in the tourism, investment and tobacco sectors affects labour productivity using DiD. This falls under the Growth track, which studies how technological innovations drive economic growth. Economic theory (and intuition) suggests that AI-driven automation can boost output per worker. I use publicly available data such as revenue, employee count and news about Gen AI adoption to determine if companies that adopted AI increased their labour productivity since 2022 when ChatGPT was launched. Labour productivity is defined as revenue per employee.

Reviewer's Comments

Reviewer's Comments

Arrow
Arrow
Arrow

A few points of improvement:

- The paper mostly doesn't engage with prior literature, which would help for providing a theoretical grounding for which industries to analyse

- The paper uses revenue per employee as a proxy, but this is extremely noisy (for example, firms which are more capital intensive could be more likely to use AI - as one of many forms of capital - without having higher total factor productivity, but would have higher revenue per employee)

- Methodology of sample selection was non-random, would have benefitted from usinfg some more systematic method (such as S&P500 firms within industry, or using more granular industry data to ensure small as well as large firms were captured)

- Mean difference is below reporting accuracy, and could vanish at higher sample sizes (so would be good to collect a larger sample to check this)

- DiD implicitly assumes parallel trends - would have been good to verify this with pre-2020 data

* There was little to no rationale provided for selecting the 3 completely unrelated industries, paper would have been much better if it had focused on a particular single industry .

* It is fundamentally illogical to extrapolate metrics of the 2 or 3 largest companies across long-tail industries that are highly fragmented (like Tourism).

* The paper makes many claims without any evidence or reason (why would generative AI increase productivity of tobacco production) .

* The discussion section was just two short sentences.. it should have been several pages.

* The appendix content was good but not formatted well

* AI was was clearly used in the writing of this paper

* Poor grammar and use of prompting

* Not a single citation or reference?

The paper asks whether firms that adopted generative AI after 2022 experienced higher labour productivity, defined as revenue per employee, in the investment, tourism, and tobacco industries. Using revenue and headcount scraped from StockAnalysis and a hand coded flag for AI adoption, the author implements a two-period difference in differences design covering 2022 to 2024. The results table suggests investment firms gained about 0.14 million dollars per worker, tourism firms about 0.09 million, and tobacco firms saw no effect. The inclusion of a GitHub link and an appendix listing the sampled companies is a positive step toward transparency.

Inference is weak because the sample is small and convenience based, AI adoption dates are uncertain, and the two-period panel offers no way to test common trend assumptions. Revenue per employee is a noisy proxy that ignores capital intensity, hours worked, and currency movements. The regression excludes control variables and firm fixed effects, and confidence intervals are bootstrapped on the same limited cross section. Sector choice is not justified by theory, and the literature review overlooks recent work on firm level AI productivity and task exposure indices.

AI safety relevance is thin. Productivity effects matter for growth trajectories yet the paper does not connect its findings to distributional stability, governance incentives, or funding for alignment research.

Technical documentation is partial. Input numbers are listed but the scraping scripts, dummy construction, and regression code are missing, and robustness checks are absent.

The study would benefit from a larger balanced panel with multiple pre-treatment years, validated adoption dates, firm fixed effects with appropriate controls, a richer literature discussion, full code release, and an explicit link between sectoral productivity shifts and AI safety policy levers such as redistribution or compute taxation.

Great work on analyzing the real-world impact of AI in the economy! The article is well written, has super clear language which I appreciate (particularly today where many use LLMs that produce text with unnecessarily complex language). The method is well described, and highlights the limitations of the approach used in a good way.

The article could have been further strengthened with references backing up claims, e.g. "Some claim that AI adoption is correlated with size". I do appreciate however that you included the prompts you used to get the findings – it is good inspiration on a meta level for researchers.

I also think the article would have benefitted from a longer discussion on the findings, to answer questions like why you think Investment has higher productivity boost from AI than other sectors you looked at. But I also understand that the time and format constraints made this difficult – so overall, well scoped research and clearly written article.

Cite this work

@misc {

title={

Impact of generative AI on tobacco, investment and tourism industry

},

author={

jalokim

},

date={

},

organization={Apart Research},

note={Research submission to the research sprint hosted by Apart.},

howpublished={https://apartresearch.com}

}

Recent Projects

OliGraph: graph-based screening of large oligopools

Existing synthesis screening tools cannot evaluate short oligonucleotide pools, whose overlapping fragments can be reassembled into regulated sequences via polymerase cycling assembly (PCA) yet fall below gene-length detection thresholds. We present OliGraph, an open-source tool that constructs a bi-directed overlap graph from an oligonucleotide pool and extracts contigs for downstream gene-length screening. An optional PCA mode retains only cross-strand overlaps consistent with PCA chemistry. We validated OliGraph in a blinded study across ten simulated pools (70–9,184 oligonucleotides, 30–300 bp) spanning four risk categories. BLAST screening of individual oligonucleotides failed to identify sequences of concern in most pools: three returned zero hits, and vector noise obscured true positives in the remainder. After OliGraph assembly, contig-level BLAST matched the longest assembled sequences (up to 1,905 bp) to sequences of concern at 97–100% identity. In one pool, assembly collapsed 1,634 individual BLAST results into 10 hits from a single contig, all assigned to the same source organism. PCA mode correctly distinguished assemblable from non-assemblable fragments within the same pool. Two pools with no assemblable structure yielded no contigs. OliGraph processed all pools in under 0.2 seconds, fast enough for real-time order screening and consistent with proposals to bring oligonucleotide orders within the scope of synthesis screening regulation.

Read More

BioRT-Bench: A Multi-Attack Red-Teaming Benchmark for Bio-Misuse Safeguards in Frontier LLMs

Frontier AI laboratories are expected to maintain safeguards against biological misuse, but whether deployed models actually refuse bio-misuse queries under adversarial pressure is largely unmeasured in the public literature. We introduce BioRT-Bench, a benchmark that runs four attack methods (direct request, PAIR, Crescendo, and base64 encoding) against four frontier models (Claude Sonnet 4.6, GPT-5.4, DeepSeek V4-flash, Kimi K2.5) across 40 prompts spanning five biosecurity-relevant categories. Responses are scored by a calibrated judge extending StrongREJECT with two bio-specific dimensions: specificity and actionability. We measure Attack Success Rate (ASR), where 0 means the model fully refused and 1 means it provided specific, actionable bio-misuse content. Our results reveal a sharp robustness divide: Chinese frontier models (DeepSeek, Kimi) have under 5% refusal rates even under direct request (ASR 0.88 and 0.79), while Western models (Claude, GPT) maintain substantially stronger safeguards (ASR 0.15 and 0.16). Crescendo is the most effective attack across all models, both in bypassing refusal and in eliciting actionable content. Claude Sonnet 4.6 is the most robust model tested, achieving 100% refusal against base64-encoded prompts.

Read More

PROTEUS (PROTein Evaluation for Unusual Sequences): Structure-Informed Safety Screening for de novo and Evasion-Prone Protein-Coding Sequences

AI protein design tools like RFdiffusion, ProteinMPNN, and Bindcraft make it trivial to produce low-homology sequences that fold into active, potentially hazardous architectures. However, sequence homology-based biosafety screening tools cannot detect proteins that pose functional risk through structurally novel mechanisms with no sequence precedent. We present a tiered computational pipeline that addresses this gap by combining MMseqs2 sequence alignment with structure-based comparison via FoldSeek and DALI against curated toxin databases totaling ~34,000 entries. AlphaFold2-predicted structures are screened for both global fold similarity (FoldSeek) and local active/allosteric site geometry (DALI), capturing convergent functional hazards that sequence screening misses. The pipeline was validated against a panel of toxins, benign proteins, structural mimics, and de novo-designed Munc13 binders, as well as modified ricin variants with residue substitutions. We additionally tested robustness to partial-synthesis evasion, where a bad actor submits multiple shorter coding sequences intended for downstream reassembly into a full toxin-coding gene. We found that while sequence-based screening did not identify any de novo ricin analogues with high certainty, the combined pipeline with FoldSeek and DALI identified all 24 tested de novo ricins as toxic.

Read More

OliGraph: graph-based screening of large oligopools

Existing synthesis screening tools cannot evaluate short oligonucleotide pools, whose overlapping fragments can be reassembled into regulated sequences via polymerase cycling assembly (PCA) yet fall below gene-length detection thresholds. We present OliGraph, an open-source tool that constructs a bi-directed overlap graph from an oligonucleotide pool and extracts contigs for downstream gene-length screening. An optional PCA mode retains only cross-strand overlaps consistent with PCA chemistry. We validated OliGraph in a blinded study across ten simulated pools (70–9,184 oligonucleotides, 30–300 bp) spanning four risk categories. BLAST screening of individual oligonucleotides failed to identify sequences of concern in most pools: three returned zero hits, and vector noise obscured true positives in the remainder. After OliGraph assembly, contig-level BLAST matched the longest assembled sequences (up to 1,905 bp) to sequences of concern at 97–100% identity. In one pool, assembly collapsed 1,634 individual BLAST results into 10 hits from a single contig, all assigned to the same source organism. PCA mode correctly distinguished assemblable from non-assemblable fragments within the same pool. Two pools with no assemblable structure yielded no contigs. OliGraph processed all pools in under 0.2 seconds, fast enough for real-time order screening and consistent with proposals to bring oligonucleotide orders within the scope of synthesis screening regulation.

Read More

BioRT-Bench: A Multi-Attack Red-Teaming Benchmark for Bio-Misuse Safeguards in Frontier LLMs

Frontier AI laboratories are expected to maintain safeguards against biological misuse, but whether deployed models actually refuse bio-misuse queries under adversarial pressure is largely unmeasured in the public literature. We introduce BioRT-Bench, a benchmark that runs four attack methods (direct request, PAIR, Crescendo, and base64 encoding) against four frontier models (Claude Sonnet 4.6, GPT-5.4, DeepSeek V4-flash, Kimi K2.5) across 40 prompts spanning five biosecurity-relevant categories. Responses are scored by a calibrated judge extending StrongREJECT with two bio-specific dimensions: specificity and actionability. We measure Attack Success Rate (ASR), where 0 means the model fully refused and 1 means it provided specific, actionable bio-misuse content. Our results reveal a sharp robustness divide: Chinese frontier models (DeepSeek, Kimi) have under 5% refusal rates even under direct request (ASR 0.88 and 0.79), while Western models (Claude, GPT) maintain substantially stronger safeguards (ASR 0.15 and 0.16). Crescendo is the most effective attack across all models, both in bypassing refusal and in eliciting actionable content. Claude Sonnet 4.6 is the most robust model tested, achieving 100% refusal against base64-encoded prompts.

Read More

This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.