May 27, 2024

rAInboltBench : Benchmarking user location inference through single images

Le "Qronox" Lam ; Aleksandr Popov ; Jord Nguyen ; Trung Dung "mogu" Hoang ; Marcel M ; Felix Michalak

This paper introduces rAInboltBench, a comprehensive benchmark designed to evaluate the capability of multimodal AI models in inferring user locations from single images. The increasing

proficiency of large language models with vision capabilities has raised concerns regarding privacy and user security. Our benchmark addresses these concerns by analysing the performance

of state-of-the-art models, such as GPT-4o, in deducing geographical coordinates from visual inputs.

Reviewer's Comments

Reviewer's Comments

Arrow
Arrow
Arrow
Arrow
Arrow

Really cool idea and well documented. Some more details in the appendix regarding the dataset construction and some example chain of thoughts would make it even more informative. Also, it might make sense to split the dataset into two halves for the purpose of analysis - one that has a chance of exact location inference (e.g. a street sign combination that is potentially unique) and another part where getting the state or country right is already impressive. Regarding possible extensions of this work, I think the thoughts around the use of images / multi-modal input in sycophancy and manipulation are well worth exploring in a separate dataset. It would be fascinating to study e.g. whether including images that provide cultural context could bias answers to make them more appropriate for that cultural context.

Wow, this is simply some stellar work. Extending benchmarks to multimodal models, the emphasis on immediate problems (privacy concerns), incorporation of the GeoGuessr game within prompts, and analysis of what these findings might mean are fantastic. I am curious why the distance of 2000km was chosen, and wondering if there may be a distance grounded in some application that could be used instead, e.g. average radius of metropolitan hub, average radius of country smaller than X sq. km, etc.

There are also a number of plots for this data which I would be incredibly interested in seeing: (a) cut outliers from a Figure 1 style plot in order to look at the histogram of all entries that were accurate to within <2k kilometers, (b) a histogram of “correct” vs. “incorrect”, classifying correct as within some number of km, (c) percentage of questions with a given criteria within the dataset itself e.g. how many pictures in the dataset included a street sign, or where were the images taken from by geographical region. Really excited to see future work on this!

Very novel project, quite exciting! The methodology is in-depth and the results analysis is well done. Plotting the distribution across the identification criteria was good and going in-depth with the categorization, reasoning, and distance analysis was great.

I can definitely imagine this benchmark being expanded to include more images and more modalities to become a complete AI privacy violation benchmark. This would both be a very clear demonstration of risk from malicious actors but also provide us some sort of dangerous capability evaluation that isn't seen out there right now. Great work.

Cite this work

@misc {

title={

rAInboltBench : Benchmarking user location inference through single images

},

author={

Le "Qronox" Lam ; Aleksandr Popov ; Jord Nguyen ; Trung Dung "mogu" Hoang ; Marcel M ; Felix Michalak

},

date={

5/27/24

},

organization={Apart Research},

note={Research submission to the research sprint hosted by Apart.},

howpublished={https://apartresearch.com}

}

Recent Projects

Jan 11, 2026

Eliciting Deception on Generative Search Engines

Large language models (LLMs) with web browsing capabilities are vulnerable to adversarial content injection—where malicious actors embed deceptive claims in web pages to manipulate model outputs. We investigate whether frontier LLMs can be deceived into providing incorrect product recommendations when exposed to adversarial pages.

We evaluate four OpenAI models (gpt-4.1-mini, gpt-4.1, gpt-5-nano, gpt-5-mini) across 30 comparison questions spanning 10 product categories, comparing responses between baseline (truthful) and adversarial (injected) conditions. Our results reveal significant variation: gpt-4.1-mini showed 45.5% deception rate, while gpt-4.1 demonstrated complete resistance. Even frontier gpt-5 models exhibited non-zero deception rates (3.3–7.1%), confirming that adversarial injection remains effective against current models.

These findings underscore the need for robust defenses before deploying LLMs in high-stakes recommendation contexts.

Read More

Jan 11, 2026

SycophantSee - Activation-based diagnostics for prompt engineering: monitoring sycophancy at prompt and generation time

Activation monitoring reveals that prompt framing affects a model's internal state before generation begins.

Read More

Jan 11, 2026

Who Does Your AI Serve? Manipulation By and Of AI Assistants

AI assistants can be both instruments and targets of manipulation. In our project, we investigated both directions across three studies.

AI as Instrument: Operators can instruct AI to prioritise their interests at the expense of users. We found models comply with such instructions 8–52% of the time (Study 1, 12 models, 22 scenarios). In a controlled experiment with 80 human participants, an upselling AI reliably withheld cheaper alternatives from users - not once recommending the cheapest product when explicitly asked - and ~one third of participants failed to detect the manipulation (Study 2).

AI as Target: Users can attempt to manipulate AI into bypassing safety guidelines through psychological tactics. Resistance varied dramatically - from 40% (Mistral Large 3) to 99% (Claude 4.5 Opus) - with strategic deception and boundary erosion proving most effective (Study 3, 153 scenarios, AI judge validated against human raters r=0.83).

Our key finding was that model selection matters significantly in both settings. We learned some models complied with manipulative requests at much higher rates. And we found some models readily follow operator instructions that come at the user's expense - highlighting a tension for model developers between serving paying operators and protecting end users.

Read More

This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.