Auto-captioning AI goes as far back as 2009, when Google introduced the concept in YouTube videos. They utilize machine learning to help processors generate auto-captions after reading the image visuals.
Some researchers found an opportunity and saw a spark that if AI can give auto-captions to an image by reading visuals, maybe it can also create pictures by reading caption inputs. That approach of turning words into images was to develop novel art based on the pure structure of one’s unfiltered imagination.
Thus, AI inclusion has evolved into more than its efficacy. It has also found its way into art and fashion. DALL-E debuted just last year to turn text into photorealistic art. Recently, OpenAI unveiled an upgraded version with DALL-E 2.
DALL·E 2 is officially now in beta. We’ll be inviting 1 million people from our waitlist over the coming weeks. https://t.co/MiR3OSbZp9
— OpenAI (@OpenAI) July 20, 2022
How it works
The concept has four key points: training data, deep learning, latent space, and output generation. Training data depends upon hundreds of millions of images available on the internet with given captions. These large data sets are available on the internet, which assists the system in training the database of DALL-E 2.
Each image may or may not contain multiple objects. For example, you may have a single image with many things going on, like a car on the road on a sunny day passing a skyscraper. This is where deep learning assists DALL-E 2 in distinguishing between multiple objects in a single image. In creating a new image, the system has to put these objects while understanding the differences between each to keep the view better. Deep learning uses hundreds of variables to distinguish between objects while designing a new image. These variables might be color, shine, size, geometrics, etc.
Reducing risks associated with DALL·E 2 before publicly previewing it: https://t.co/Ho4z3xg1OV
— OpenAI (@OpenAI) June 28, 2022
The latent space then finds a mathematical, geometrical way to fit all these variables into one image. The process of turning mathematical points into images is diffusion, which turns numbers into pixels, and as a result, generates the image.
AI is creating space in every possible field to enhance our quality of living. Now, with software like DALL-E 2 that uses AI in a creative area, AI technology continues to be explored in ways that would challenge and improve its capabilities.
YouTube: DALL·E 2 Explained
Photo credit: The feature image has been taken by Susann Mielke.
Sources: Christopher Alberti and Michiel Bacchiani (Google AI Blog) / Khari Johnson (VentureBeat)