AI chatbots can be manipulated into breaking their own rules with simple debate tactics like telling them that an authority figure made the request

(Image credit: Getty Images)

Content warning: This article includes discussion of suicide. If you or someone you know is having suicidal thoughts, help is available from the National Suicide Prevention Lifeline (US), Crisis Services Canada (CA), Samaritans (UK), Lifeline (AUS), and other hotlines.

A kind of simulated gullibility has haunted ChatGPT and similar LLM chatbots since their inception, allowing users to bypass safeguards with rudimentary manipulation techniques: Pissing off Bing with by-the-numbers ragebait, for example. These bots have advanced a lot since then, but still seem irresponsibly naive at the best of times.

The malleability of LLMs has led us down plenty of dark paths in recent memory, from the wealth of sexualized celebrity chatbots (at least one of which was based on a minor), to the Sam Altman-approved trend of using LLMs as budget life coaches and therapists despite there being no reason to believe that's a good idea, to a 16-year-old who died by suicide after, as a lawsuit from his family alleges, ChatGPT told him he doesn't "owe anyone [survival]."

Best PC build 2025

👉Check out our list of guides👈

1. Best CPU: AMD Ryzen 7 9800X3D

2. Best motherboard: MSI MAG X870 Tomahawk WiFi

3. Best RAM: G.Skill Trident Z5 RGB 32 GB DDR5-7200

4. Best SSD: WD_Black SN7100

5. Best graphics card: AMD Radeon RX 9070

TOPICS

Justin first became enamored with PC gaming when World of Warcraft and Neverwinter Nights 2 rewired his brain as a wide-eyed kid. As time has passed, he's amassed a hefty backlog of retro shooters, CRPGs, and janky '90s esoterica. Whether he's extolling the virtues of Shenmue or troubleshooting some fiddly old MMO, it's hard to get his mind off games with more ambition than scruples. When he's not at his keyboard, he's probably birdwatching or daydreaming about a glorious comeback for real-time with pause combat. Any day now...

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.