Apr 28, 2025

Economic Impact Analysis: The Impact of AI on the Indian IT Sector

Maimuna Zaheer, Alina Plyassulya

We studied AI’s impact on India’s IT sector. We modelled a 20% labour shock and proposed upskilling and insurance policies to reduce AI-driven job losses.

Reviewer's Comments

Reviewer's Comments

Arrow
Arrow
Arrow

The project tackles an important problem: how wil low and middle income countries be affected by AI. The authors could have been more precise which risks beyond inequality they are adressing. I would have liked to see the code and not only the prompt which created it (GPT sometimes hallucinates).

1. Innovation & Literature Foundation

1. 2.5

2. Good knowledge of the problem.

3. I'd like to see more attention to literature on the policies you assessed. Your assumptions of their efficacy are crucial — so it's helpful to get them in the right ballpark.

4. Especially upskilling. Economists have studied that one to death. Especially in the US but there's also great work in Ethiopia (Girum Abebe) and South Asia.

2. Practical Impact on AI Risk Reduction

1. 3

2. The policy problem is very clearly identified. Honestly, just showing that 20% of jobs in this sector (a not-unreasonable number) is 675,000 jobs will make policymakers sit up and pay attention.

3. It's useful to talk about the high-cost and low-return to unemployment insurance. Jobs displaced by AI don't come back. In labor econ-lingo, those displaced need to turn to new tasks. Which is what up-skilling is for! So: well-chosen policies.

4.

3. Methodological Rigor & Scientific Quality

1. 3

2. I'm not sure I follow the assumptions, and I would love to see the code!

3. It seems that you assume a ±20% to the sector scales both employment and revenue by ±20% (Figs 1 and 2). It's definitely possible for AI to increase/decrease employment by 20%, and the same for revenue, but it's very unlikely that these are correlated! For example, it seems more likely that AI would increase revenues while decreasing headcount!

4. What are the assumptions used for the policy comparison analysis? You mention dummy data (which is very reasonable in a research sprint!). As I mentioned above; it might be helpful to calibrate these assumptions to the estimates on existing upskilling experiments. There's lots of work on the US, but more relevant to your context might be work in Ethiopia and Bangladesh.

The paper tackles an important question: how transformative AI might disrupt employment and public finances in India’s IT-BPM sector. It grounds the discussion in publicly available OECD TiVA data. The authors clearly present headline results: a ±20 % labour-input shock translates into roughly ±675 000 jobs and about USD 8.4 billion in tax revenue. Two stylised policy responses (up-skilling vouchers and lay-off insurance) appear affordable relative to the taxes that would otherwise be lost. The write-up is concise, the causal chain is easy to follow, and the inclusion of tentative cost-benefit numbers offers a useful starting point for policy debate.

The analysis, however, remains exploratory. The 20 % shock is imposed exogenously with no justification from adoption curves or task-level automation risk estimates, so results could change materially under different assumptions. Treating employment as a fixed ratio of value added ignores capital deepening and substitution elasticities, while reliance on a single 25 % effective tax rate obscures India’s heterogeneous fiscal structure. The input-output framework is labelled “partial” but not specified, which prevents readers from replicating coefficient adjustments or inspecting sectoral knock-on effects. Literature coverage is thin; only a handful of general reports are cited, omitting recent empirical and theoretical work on AI labour substitution, skill-biased technical change, and AI safety-oriented governance mechanisms. AI safety relevance is indirect: the focus is economic displacement rather than mitigation of catastrophic or misuse risks, and links to alignment or systemic safety concerns are not developed. Finally, the methodology, code, and data cleaning steps are not documented in a repository, which limits transparency.

To strengthen the submission, the authors should (i) justify the shock magnitudes with evidence, (ii) perform sensitivity and scenario analysis, (iii) move toward a dynamic or general-equilibrium model that captures feedback effects, (iv) expand the literature review to situate the work in AI economics and AI safety debates, and (v) publish a reproducible notebook and data appendix. Clarifying how the proposed interventions align with broader AI safety objectives, such as reducing tail-risk incentives or supporting safe-development norms, would also raise the study’s impact.

Interesting study on the impact of a labour input shock on the Indian IT sector. It would have been interesting to see a broader discussion of spillover effects through the network of the Indian Economy. Overall, well executed.

Cite this work

@misc {

title={

Economic Impact Analysis: The Impact of AI on the Indian IT Sector

},

author={

Maimuna Zaheer, Alina Plyassulya

},

date={

4/28/25

},

organization={Apart Research},

note={Research submission to the research sprint hosted by Apart.},

howpublished={https://apartresearch.com}

}

Recent Projects

Apr 27, 2026

OliGraph: graph-based screening of large oligopools

Existing synthesis screening tools cannot evaluate short oligonucleotide pools, whose overlapping fragments can be reassembled into regulated sequences via polymerase cycling assembly (PCA) yet fall below gene-length detection thresholds. We present OliGraph, an open-source tool that constructs a bi-directed overlap graph from an oligonucleotide pool and extracts contigs for downstream gene-length screening. An optional PCA mode retains only cross-strand overlaps consistent with PCA chemistry. We validated OliGraph in a blinded study across ten simulated pools (70–9,184 oligonucleotides, 30–300 bp) spanning four risk categories. BLAST screening of individual oligonucleotides failed to identify sequences of concern in most pools: three returned zero hits, and vector noise obscured true positives in the remainder. After OliGraph assembly, contig-level BLAST matched the longest assembled sequences (up to 1,905 bp) to sequences of concern at 97–100% identity. In one pool, assembly collapsed 1,634 individual BLAST results into 10 hits from a single contig, all assigned to the same source organism. PCA mode correctly distinguished assemblable from non-assemblable fragments within the same pool. Two pools with no assemblable structure yielded no contigs. OliGraph processed all pools in under 0.2 seconds, fast enough for real-time order screening and consistent with proposals to bring oligonucleotide orders within the scope of synthesis screening regulation.

Read More

Apr 27, 2026

BioRT-Bench: A Multi-Attack Red-Teaming Benchmark for Bio-Misuse Safeguards in Frontier LLMs

Frontier AI laboratories are expected to maintain safeguards against biological misuse, but whether deployed models actually refuse bio-misuse queries under adversarial pressure is largely unmeasured in the public literature. We introduce BioRT-Bench, a benchmark that runs four attack methods (direct request, PAIR, Crescendo, and base64 encoding) against four frontier models (Claude Sonnet 4.6, GPT-5.4, DeepSeek V4-flash, Kimi K2.5) across 40 prompts spanning five biosecurity-relevant categories. Responses are scored by a calibrated judge extending StrongREJECT with two bio-specific dimensions: specificity and actionability. We measure Attack Success Rate (ASR), where 0 means the model fully refused and 1 means it provided specific, actionable bio-misuse content. Our results reveal a sharp robustness divide: Chinese frontier models (DeepSeek, Kimi) have under 5% refusal rates even under direct request (ASR 0.88 and 0.79), while Western models (Claude, GPT) maintain substantially stronger safeguards (ASR 0.15 and 0.16). Crescendo is the most effective attack across all models, both in bypassing refusal and in eliciting actionable content. Claude Sonnet 4.6 is the most robust model tested, achieving 100% refusal against base64-encoded prompts.

Read More

Apr 27, 2026

PROTEUS (PROTein Evaluation for Unusual Sequences): Structure-Informed Safety Screening for de novo and Evasion-Prone Protein-Coding Sequences

AI protein design tools like RFdiffusion, ProteinMPNN, and Bindcraft make it trivial to produce low-homology sequences that fold into active, potentially hazardous architectures. However, sequence homology-based biosafety screening tools cannot detect proteins that pose functional risk through structurally novel mechanisms with no sequence precedent. We present a tiered computational pipeline that addresses this gap by combining MMseqs2 sequence alignment with structure-based comparison via FoldSeek and DALI against curated toxin databases totaling ~34,000 entries. AlphaFold2-predicted structures are screened for both global fold similarity (FoldSeek) and local active/allosteric site geometry (DALI), capturing convergent functional hazards that sequence screening misses. The pipeline was validated against a panel of toxins, benign proteins, structural mimics, and de novo-designed Munc13 binders, as well as modified ricin variants with residue substitutions. We additionally tested robustness to partial-synthesis evasion, where a bad actor submits multiple shorter coding sequences intended for downstream reassembly into a full toxin-coding gene. We found that while sequence-based screening did not identify any de novo ricin analogues with high certainty, the combined pipeline with FoldSeek and DALI identified all 24 tested de novo ricins as toxic.

Read More

Apr 27, 2026

OliGraph: graph-based screening of large oligopools

Existing synthesis screening tools cannot evaluate short oligonucleotide pools, whose overlapping fragments can be reassembled into regulated sequences via polymerase cycling assembly (PCA) yet fall below gene-length detection thresholds. We present OliGraph, an open-source tool that constructs a bi-directed overlap graph from an oligonucleotide pool and extracts contigs for downstream gene-length screening. An optional PCA mode retains only cross-strand overlaps consistent with PCA chemistry. We validated OliGraph in a blinded study across ten simulated pools (70–9,184 oligonucleotides, 30–300 bp) spanning four risk categories. BLAST screening of individual oligonucleotides failed to identify sequences of concern in most pools: three returned zero hits, and vector noise obscured true positives in the remainder. After OliGraph assembly, contig-level BLAST matched the longest assembled sequences (up to 1,905 bp) to sequences of concern at 97–100% identity. In one pool, assembly collapsed 1,634 individual BLAST results into 10 hits from a single contig, all assigned to the same source organism. PCA mode correctly distinguished assemblable from non-assemblable fragments within the same pool. Two pools with no assemblable structure yielded no contigs. OliGraph processed all pools in under 0.2 seconds, fast enough for real-time order screening and consistent with proposals to bring oligonucleotide orders within the scope of synthesis screening regulation.

Read More

Apr 27, 2026

BioRT-Bench: A Multi-Attack Red-Teaming Benchmark for Bio-Misuse Safeguards in Frontier LLMs

Frontier AI laboratories are expected to maintain safeguards against biological misuse, but whether deployed models actually refuse bio-misuse queries under adversarial pressure is largely unmeasured in the public literature. We introduce BioRT-Bench, a benchmark that runs four attack methods (direct request, PAIR, Crescendo, and base64 encoding) against four frontier models (Claude Sonnet 4.6, GPT-5.4, DeepSeek V4-flash, Kimi K2.5) across 40 prompts spanning five biosecurity-relevant categories. Responses are scored by a calibrated judge extending StrongREJECT with two bio-specific dimensions: specificity and actionability. We measure Attack Success Rate (ASR), where 0 means the model fully refused and 1 means it provided specific, actionable bio-misuse content. Our results reveal a sharp robustness divide: Chinese frontier models (DeepSeek, Kimi) have under 5% refusal rates even under direct request (ASR 0.88 and 0.79), while Western models (Claude, GPT) maintain substantially stronger safeguards (ASR 0.15 and 0.16). Crescendo is the most effective attack across all models, both in bypassing refusal and in eliciting actionable content. Claude Sonnet 4.6 is the most robust model tested, achieving 100% refusal against base64-encoded prompts.

Read More

This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.