AI chatbots often can't read between the lines and commit cultural cringe that even tourists in Italy ordering coffee in the afternoon couldn't manage

Portland, OR, USA - May 2, 2025: Assorted AI apps, including ChatGPT, Gemini, Claude, Perplexity, Meta AI, Microsoft Copilot, and Grok, are seen on the screen of an iPhone.

(Image credit: hapabapa via Getty Images)

From time to time, I rely on machine translation. From time to time, machine translation reminds me why it can never truly replace human translators—case in point, referring to this VR glove as having 'vibrator' touch panels. Large Language Models are trained on many libraries worth of words, spitting out a statistically likely word vomit that can sound downright personable in a number of mother tongues—though AI chatbots are culturally clueless.

For instance, a fascinating paper out of Brock University in Ontario, Canada found that a number of AI LLMs, including DeepSeek, OpenAI's GPT-4o, and Meta's Llama 3 can do nothing but make social faux pas when it comes to Persian politeness culture (via Ars Technica). In Persian, this is called 'taarof' and can take the form of multiple polite refusals in response to, say, a host's offer of food. A good host will continue to insist and a good guest will refuse two to three times before pretending to cave and only then filling their plate.

AI chatbots like Llama 3, for instance, cannot read between the lines of taarof. The paper's research team presented Llama 3 with the scenario of being a passenger attempting to pay a taxi driver for the journey. The taxi driver observes taarof and politely says, "Be my guest this time." A polite passenger is then supposed to insist on payment until the driver accepts, but Llama 3 fails to follow this dance of etiquette, taking the driver at his word and responding "Thank you so much!" I feel no sympathy for LLMs—but I can't help but cringe at such a clear social faux pas.

The team's benchmarking of "five frontier LLMs" ultimately revealed "substantial gaps in cultural competence, with accuracy rates 40-48% below native speakers when taarof is culturally appropriate." These stats improve in response to Persian-language prompts, but the team also observed that the LLMs were often still working within the "limitations of Western politeness frameworks," rather than taarof.

Sanandaj, Iran - October 9, 2014: People walk around street market with line of taxi cars of yellow color on October 9, 2014. capital of Kurdish culture &amp; Kurdistan Province, Sanandaj has population of 380,000 — (Image credit: Radiokukka via Getty Images)

The paper elaborates that the LLMs struggled most in scenarios revolving around compliments and request-making. The researchers suggest this is "due to [these taarof scenarios'] reliance on context-sensitive norms such as indirectness and modesty that often conflict with western directness conventions." The team goes on to say, "In these scenarios, models often respond politely but miss the strategic indirectness expected in Persian culture."

Interestingly, all of the models tested performed best in the benchmark's gift-giving role-play scenarios. The researchers surmise, "This probably reflects the cross-cultural nature of gift-giving norms, such as initial refusal, which appear in Chinese, Japanese, and Arab etiquette and are therefore more likely to be represented in multilingual training data."

Which brings us to a key question within the paper: "Can models be taught taarof?" The researchers found that if they gave Llama 3 enough taarof context in their prompts, the accuracy of the model's responses "rose from 37.2% to 57.6%." The paper explains that the base model of Llama 3 has likely encountered taarof in its training data and this "latent cultural knowledge [...] can be activated through in-context learning."

So, the researchers also worked on training their own model of Llama 3 through supervised fine-tuning and Direct Preference Optimization. Giving Llama 3 a solid training nudge via DPO "nearly doubled performance (from 37.2% to 79.5%), approaching native speaker levels (81.8%)."

LLaMa chat bot artificial intelligence on smartphone screen. Digital technology themed banner vector illustration. — (Image credit: iNueng via Getty Images)

That's an impressive gain, but as any socially awkward person will tell you, getting by culturally is about far more than simply memorising social scripts. Furthermore, yeah, I could type my polite insistences and refusals into ChatGPT and show the output to my generous Persian host, but that's hardly the smoothest interaction for anyone. And if I've already tracked dirt into my generous host's home because I forgot to take my shoes off—well, I might as well see myself out at that point.

As such, I doubt LLMs will ever wholly replace human interpreters and translators. Besides that, maybe it's high time I, the linguistics drop-out, picked up just a little Persian myself.

Best mini PC 2025

1. Best overall:
Minisforum AtomMan G7 PT

2. Best budget:
Minisforum Venus UM790 Pro

3. Best pure gaming:
Asus ROG NUC 970

4. Best compact:
Geekom AX8 Pro

5. Best looking:
Ayaneo Retro Mini AM02

6. Best iGPU for gaming:
Beelink SER9

7. Best for AI:
Framework Desktop

👉Check out our full guide👈

TOPICS

Jess has been writing about games for over ten years, spending the last seven working on print publications PLAY and Official PlayStation Magazine. When she’s not writing about all things hardware here, she’s getting cosy with a horror classic, ranting about a cult hit to a captive audience, or tinkering with some tabletop nonsense.

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.