Kevin Kelly
Picture Lee Unkrich, one of Pixar’s most distinguished animators, as a seventh grader. He’s staring at an image of a train locomotive on the screen of his school’s first computer. Wow, he thinks. Some of the magic wears off, however, when Lee learns that the image had not appeared simply by asking for “a picture of a train.” Instead, it had to be painstakingly coded and rendered—by hard-working humans.
Now picture Lee 43 years later, stumbling onto DALL-E, an artificial intelligence that generates original works of art based on human-supplied prompts that can literally be as simple as “a picture of a train.” As he types in words to create image after image, the wow is back. Only this time, it doesn’t go away. “It feels like a miracle,” he says. “When the results appeared, my breath was taken away and tears welled in my eyes. It’s that magical.”
Our machines have crossed a threshold. All our lives, we have been reassured that computers were incapable of being truly creative. Yet, suddenly, millions of people are now using a new breed of AIs to generate stunning, never-before-seen pictures. Most of these users are not, like Lee Unkrich, professional artists, and that’s the point: They do not have to be. Not everyone can write, direct, and edit an Oscar winner like Toy Story 3 or Coco, but everyone can launch an AI image generator and type in an idea. What appears on the screen is astounding in its realism and depth of detail. Thus the universal response: Wow. On four services alone—Midjourney, Stable Diffusion, Artbreeder, and DALL-E—humans working with AIs now cocreate more than 20 million images every day. With a paintbrush in hand, artificial intelligence has become an engine of wow.
Because these surprise-generating AIs have learned their art from billions of pictures made by humans, their output hovers around what we expect pictures to look like. But because they are an alien AI, fundamentally mysterious even to their creators, they restructure the new pictures in a way no human is likely to think of, filling in details most of us wouldn’t have the artistry to imagine, let alone the skills to execute. They can also be instructed to generate more variations of something we like, in whatever style we want—in seconds. This, ultimately, is their most powerful advantage: They can make new things that are relatable and comprehensible but, at the same time, completely unexpected.
So unexpected are these new AI-generated images, in fact, that—in the silent awe immediately following the wow—another thought occurs to just about everyone who has encountered them: Human-made art must now be over. Who can compete with the speed, cheapness, scale, and, yes, wild creativity of these machines? Is art yet another human pursuit we must yield to robots? And the next obvious question: If computers can be creative, what else can they do that we were told they could not?
I have spent the past six months using AIs to create thousands of striking images, often losing a night’s sleep in the unending quest to find just one more beauty hidden in the code. And after interviewing the creators, power users, and other early adopters of these generators, I can make a very clear prediction: Generative AI will alter how we design just about everything. Oh, and not a single human artist will lose their job because of this new technology.
It is no exaggeration to call images generated with the help of AI cocreations. The sobering secret of this new power is that the best applications of it are the result not of typing in a single prompt but of very long conversations between humans and machines. Progress for each image comes from many, many iterations, back-and-forths, detours, and hours, sometimes days, of teamwork—all on the back of years of advancements in machine learning.
AI image generators were born from the marriage of two separate technologies. One was a historical line of deep learning neural nets that could generate coherent realistic images, and the other was a natural language model that could serve as an interface to the image engine. The two were combined into a language-driven image generator. Researchers scraped the internet for all images that had adjacent text, such as captions, and used billions of these examples to connect visual forms to words, and words to forms. With this new combination, human users could enter a string of words—the prompt—that described the image they sought, and the prompt would generate an image based on those words.