May 6, 2024

Assessing Algorithmic Bias in Large Language Models' Predictions of Public Opinion Across Demographics

Khai Tran,Sev Geraskin,Doroteya Stoyanova,Jord Nguyen

The rise of large language models (LLMs) has opened up new possibilities for gauging public opinion on societal issues through survey simulations. However, the potential for algorithmic bias in these models raises concerns about their ability to accurately represent diverse viewpoints, especially those of minority and marginalized groups.

This project examines the threat posed by LLMs exhibiting demographic biases when predicting individuals' beliefs, emotions, and policy preferences on important issues. We focus specifically on how well state-of-the-art LLMs like GPT-3.5 and GPT-4 capture the nuances in public opinion across demographics in two distinct regions of Canada - British Columbia and Quebec.

Reviewer's Comments

Reviewer's Comments

Arrow
Arrow
Arrow
Arrow
Arrow

Simon Lermen

Data for GPT 3.5 looks strange

Konrad Seifert

The problem analysis seems super on point. The automation of key institutional features requires significantly super-human implementation to avoid creating distrust and thereby fragmentation. I would like to see more attempts at solving the representativeness problem.

Jason Hoelscher-Obermaier

A really cool question to study empirically with lots of potential for relevant insight.

Andrey Anurin

You’ve taken a very interesting approach of polling several LLMs as if they were humans and comparing that with real-world polling data. This is a very bold claim, and I think that the experimental design is lacking in a some aspects to back it up. For example, the was very little prompt engineering, for GPT the demographic data was presented with no preamble; there was little justification to look only at the “strong” response classification; every eval was done only once; different demographic slices were weighed the same (e.g. 86-95 non-binary MSc from Quebec has the same weight as everyone else).

Cite this work

@misc {

title={

Assessing Algorithmic Bias in Large Language Models' Predictions of Public Opinion Across Demographics

},

author={

Khai Tran,Sev Geraskin,Doroteya Stoyanova,Jord Nguyen

},

date={

5/6/24

},

organization={Apart Research},

note={Research submission to the research sprint hosted by Apart.},

howpublished={https://apartresearch.com}

}

Recent Projects

Jan 11, 2026

Eliciting Deception on Generative Search Engines

Large language models (LLMs) with web browsing capabilities are vulnerable to adversarial content injection—where malicious actors embed deceptive claims in web pages to manipulate model outputs. We investigate whether frontier LLMs can be deceived into providing incorrect product recommendations when exposed to adversarial pages.

We evaluate four OpenAI models (gpt-4.1-mini, gpt-4.1, gpt-5-nano, gpt-5-mini) across 30 comparison questions spanning 10 product categories, comparing responses between baseline (truthful) and adversarial (injected) conditions. Our results reveal significant variation: gpt-4.1-mini showed 45.5% deception rate, while gpt-4.1 demonstrated complete resistance. Even frontier gpt-5 models exhibited non-zero deception rates (3.3–7.1%), confirming that adversarial injection remains effective against current models.

These findings underscore the need for robust defenses before deploying LLMs in high-stakes recommendation contexts.

Read More

Jan 11, 2026

SycophantSee - Activation-based diagnostics for prompt engineering: monitoring sycophancy at prompt and generation time

Activation monitoring reveals that prompt framing affects a model's internal state before generation begins.

Read More

Jan 11, 2026

Who Does Your AI Serve? Manipulation By and Of AI Assistants

AI assistants can be both instruments and targets of manipulation. In our project, we investigated both directions across three studies.

AI as Instrument: Operators can instruct AI to prioritise their interests at the expense of users. We found models comply with such instructions 8–52% of the time (Study 1, 12 models, 22 scenarios). In a controlled experiment with 80 human participants, an upselling AI reliably withheld cheaper alternatives from users - not once recommending the cheapest product when explicitly asked - and ~one third of participants failed to detect the manipulation (Study 2).

AI as Target: Users can attempt to manipulate AI into bypassing safety guidelines through psychological tactics. Resistance varied dramatically - from 40% (Mistral Large 3) to 99% (Claude 4.5 Opus) - with strategic deception and boundary erosion proving most effective (Study 3, 153 scenarios, AI judge validated against human raters r=0.83).

Our key finding was that model selection matters significantly in both settings. We learned some models complied with manipulative requests at much higher rates. And we found some models readily follow operator instructions that come at the user's expense - highlighting a tension for model developers between serving paying operators and protecting end users.

Read More

This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.