From DDoS attacks to plain old spam, bots are a disruptive presence. A key element of the internet is ensuring humans are performing actions. But as technology evolves, our tests like the Completely Automated Turing test to tell Computers and Humans Apart (CAPTCHA) get more difficult. It may be that soon, the problem will not be creating tests hard enough for bots, but we will still have them solvable. A new project named I!&3_OCR explores how we can have text only humans can read. No fancy tests.
Rethinking the tests
Turing tests determine if a robot’s responses are distinguishable from human ones. To say a robot passed the test is to say it was indistinguishable. CAPTCHA is like a reverse of it, where humans need to prove they aren’t robots. It started in the 2000s with text-based tests, then expanded with more variations, including book scans and audio.
Related article: Machine Beats “I Am Not a Robot” Captcha Like a Boss [Video]
However, AI tools such as OCR (Optical Character Recognition) make these tests less effective. In response to a 2013 algorithm that solved them with 95% accuracy, Luis von Ahn, part of the original team that created CAPTCHA, commented that they are working on picture-based tests, and he guarantees they won’t be breakable. Years later, not only are picture-based ones breakable, but it seems it’s only a matter of time until modern tests like rotating 3D objects are also broken.
I!&3_OCR
The weirdly named project starts with a simple question: Can you have text that is readable for humans but problematic for OCR? I!&3_OCR explores the differences between human and robot text decoding and how they can be used to our advantage. Three distinct font strategies were employed: resolution, disruption, and disorientation.
I!&3_OCR’s first font occupies the low-resolution space of a 1:1 rectangle where the OCR lacks contextual clues. The second disrupts the letter by removing parts where a human would naturally still perceive the letter as a whole by the Gestalt principles. Finally, the third font disorients the OCR using the same principle as the original CAPTCHA.
All three approaches were successful in making text hard to read to OCR. They could be used in other situations, but their use as CAPTCHA is still limited. The reason word-based CAPTCHA has mostly been replaced was because of ethical concerns. Due to this, individuals who were visually impaired or had hearing difficulties could not access certain sites. Ultimately, no solution is perfect. Still, this project shows we can find ways to have such tests without them getting hard beyond our limits, and hopefully, I!&3_OCR is just a first step.
Photo Credit: The feature image is symbolic and has been taken by Phairin Thee.
Sources: Robert Hof (Forbes) / David Ramel (The Journal) / Creative Applications
