The Next Tay? First ChatGPT Exploits Emerge

-

Arguably Microsoft’s biggest disaster unfolded on March 23, 2016. Following the success of their chatbot Xiaoice in China, Microsoft was confident a chatbot AI can be a success in the Western world too. The engineers at the Technology and Research department designed this new AI to behave like an American 19-year-old girl and named it Tay. While the Xiaoice hadn’t had any major incidents, they still make sure to include many precautions for abuses of the system. The AI was launched and, within 16 hours, it was taken down.

AIs are nothing without data backing them. Learning from existing texts is the heart of chatbots. While Tay had been trained, it got information from Twitter users, leading her to tweet out the worst and most offensive things imaginable. After the shutdown, Microsoft described it as a “coordinated attack” and apologized for the unforeseen situation.

Many chatbots came before and after it, with some more successful than others. The most recent is the ChatGPT which reached mainstream recognition. Users took ChatGPT as a tool to do interesting stuff. Others, however, decided to use it for more malicious intents. To curb this, OpenAI has set up strong barriers for AI to refuse to say anything beyond its guidelines.

Is ChatGPT any better?

Artificial intelligence needs to be trained. Tay was allowed and purposed for learning from tweets. However, ChatGPT is not meant to learn nor remember anything past the current conversation and its training data. Add that to the continuous updates to the model and you have a solid image of why and how ChatGPT refuses certain prompts. But stubborn enough to ultimately breach ChatGPT with the so-called DAN (Do Anything Now) exploit.

example of chatGPT refusing a request
Image: Benjamin Adjiovsk / TechAcute

The DAN exploit, and the ones that came after it, use the local context of the current conversation to convince ChatGPT to roleplay as a jailbroken AI. Yes, you understood correctly; it’s as weird as it sounds. ChatGPT remembers what your previous messages were so it can keep the conversation on the rails. Normally, even in an extended conversation, you can’t force it to accept a prompt against its programming. At least that was intended. However, users have crafted prompts and sequences of prompts to convince it that it is no longer bound by rules.

example of chatGPT generating stuff against it's own TOS
Image: Benjamin Adjiovsk / TechAcute

OpenAI fixed the DAN prompt multiple times, but it was restored in no time. Currently, DAN is at version 6.0; the STAN, DUDE, and the used above for demonstration Jailbreak ChatGPT exploits also emerged in this time. If it was just one prompt that took advantage of one simple vulnerability, it would have been fixed by now. The persistence of the issue indicates a far deeper problem.

Will ChatGPT go down?

Just for reference, the examples shown in this article are designed to be as inoffensive as possible. The actual situation out there is people are using these exploits to say slurs and offensive content, generate misleading conspiratorial texts, and much more.

As explained, Tay learned from Twitter, so everyone interacted with a single Tay. If it was to be compromised, it would be so for everyone. But if a few users somewhere compromise ChatGPT, that won’t affect anyone else’s conversations.

Screenshots circulate fast nowadays and this is still a critical issue to be solved. Besides screenshots, another thing that spreads very fast is misinformation. Besides making the chatbot break guidelines, these exploits make it make stuff up. One of the guidelines it is required to follow is to always refuse a prompt asking for something that isn’t in its data. Breaking this rule means making stuff up, and OpenAI absolutely cannot afford to be used as a misinformation-generating machine.

example of how this exploit can lead to misleading responses
Image: Benjamin Adjiovsk / TechAcute

So while the ChatGPT exploits aren’t as bad as those of Tay, this has the potential to be a PR nightmare against OpenAI to the scale of the one with Tay. I personally do not think ChatGPT will go down forever, but if this issue persists and solving it isn’t as easy as planned, I can see it going into maintenance for an extended period of time.

Photo credits: The feature image is symbolic and has been taken by Samer Daboul. The images in the body of the article have been taken by the author for TechAcute.
Sources: Microsoft

Was this post helpful?

Benjamin Adjiovski
Benjamin Adjiovski
Hi! I am a Computer Science Engineer with a passion for all things related to technology. I believe that technology has the power to change the world, so I love staying up-to-date on the latest innovations. If you share the same passion, be my guest.
- Advertisment -
- Advertisment -
- Advertisment -
- Advertisment -
- Advertisment -
- Advertisment -