How DALL-E was able to spark a creative revolution

0
790

DisclaimerAll images in this story were generated using artificial intelligence

Every few years a technology comes along that neatly divides the world into before and after. I remember the first time I saw a YouTube video embedded on a web page; the first time I synced Evernote files between devices; the first time I scanned tweets from people nearby to see what they were saying about a concert I was attending.

I remember the first time I called up a song Shazam, an Uber and live streamed myself with Meerkat. What makes these moments stand out, I think, is the sense that an unpredictable array of new possibilities had been unlocked. What would the web be if you could easily add video clips to it? When were you able to access any file from the cloud to your phone? When could you send yourself out to the world?

It’s been a few years since I saw the kind of burgeoning technology that made me call my friends and say, you have to see this† But this week I did as I have a new one to add to the list. It’s an image generation tool called DALL-E, and while I have very little idea how it’ll end up being used, it’s one of the most exciting new products I’ve seen since I started writing this one. newsletter.

Technically, the technology in question is DALL-E 2† It was made by OpenAI, a seven-year-old San Francisco company whose mission is to create a secure and useful artificial general intelligence. OpenAI is already well known in its field for creating GPT-3, a powerful tool for generating advanced text passages based on simple prompts, and Second pilota tool that helps automate code writing for software engineers.

DALL-E — a contraction of the surrealist Salvador Dalí and Pixar’s WALL-E – takes text prompts and generates images from them. In January 2021, the company introduced the first version of the toolwhich was limited to squares of 256 by 256 pixels.

But the second version, which entered a private research beta in April, feels like a radical leap forward. The images are now 1,024 by 1,024 pixels and may incorporate new techniques such as “inpainting” – replacing one or more elements of an image with another. (Imagine taking a picture of an orange in a bowl and replacing it with an apple.) DALL-E is also better at understanding the relationship between objects, allowing it to depict increasingly fantastic scenes: a koala feeding a basketball, an astronaut riding a horse.

since weeks, threads of DALL-E generated images have taken over my Twitter timeline. And after pondering what I could do with the technology… namely, waste countless hours on it — a very nice person at OpenAI took pity on me and invited me to the private research beta. The number of people with access is now in the low thousands, a spokeswoman told me today; the company hopes to add 1,000 people per week.


When creating an account, OpenAI ensures that you agree to DALL-E . Content Policy, which is designed to avoid most of the obvious potential abuses of the platform. No hate, harassment, violence, sex or nudity is allowed, nor is the company asking you not to create images related to politics or politicians. (It seems worth noting here that one of OpenAI’s co-founders is Elon Musk, who is famously angry on Twitter for a much less restrictive policy. He left the board in 2018.)

DALL-E also avoids a lot of potential imaging by adding keywords (eg “shoot”) to a block list. You should also not use it to create images intended to deceive – deepfakes are not allowed. And while there’s no prohibition against trying to create images based on public figures, you can’t upload photos of people without their permission, and the technology seems to blur most faces slightly to show that the images have been manipulated.

Once you’ve agreed, you’ll be presented with DALL-E’s wonderfully simple interface: a text box that invites you to create anything you can imagine, if content policies allow it. Imagine using the Google search bar as if it were Photoshop – that’s DALL-E. DALL-E takes some inspiration from the search engine and includes a “surprise me” button that prefills the text with a suggested search query, based on past successes. I’ve often used this to get ideas for trying out artistic styles that I might never have otherwise considered – a ‘macro 35mm photo’, for example, or pixel art.

For each of my first searches, DALL-E would take about 15 seconds to generate 10 images. (Earlier this week, the number of images was reduced to six to give more people access.) Almost every time I swore and laughed out loud at the good results.

For example, here is a result of “a shiba inu dog dressed as a firefighter.”

And here’s one of “a bulldog dressed like a wizard, digital art.”

I love these fake AI dogs so much. I want to adopt them and then write children’s books about them. If the metaverse ever exists, I want them to join me there.

You know who else could come? “Frog with hat on, digital art.”

Why is he literally perfect?

On our Sidechannel Discord server, I started taking requests. Someone asked to show “the metaverse by night, digital art”. What came back, I thought, was suitably grand and abstract:

I won’t try to explain here how DALL-E creates these images, partly because I’m still trying to understand it myself. (One of the core technologies involved, ‘diffusion’, is helpfully explained in this blog post last year from Google AI.) But it has repeatedly struck me how creative this image generation technology can seem.

Take, for example, two results shared in my Discord by another reader with DALL-E access. First, look at the series of results for “A Bear Economist Before a Crashing Stock Chart, Digital Art.”

And second, “A bull economist for a chart of a rising stock market with upline, synthwave, digital art.”

It is striking how much DALL-E captures emotion here: the fear and annoyance of the bear, and the aggression of the bull. It seems wrong to describe all of this as “creative” – ​​what we’re looking at here is nothing more than probabilistic guesses – and yet they have the same effect on me as looking at something really creative.

Another fascinating aspect of DALL-E is the way it tries to solve a single problem in different ways. For example, when I asked it to show me “a delicious cinnamon roll with googly eyes,” it had to figure out how to portray the eyes.

Sometimes DALL-E would add a pair of plastic-looking eyes to a roll, as I would have done. Other times it created eyes from the negative space in the glaze. And in one case it made the eyes out miniature cinnamon rolls

That was one of the times I cursed out loud and started laughing.

DALL-E is the most advanced image generation tool I’ve seen yet, but it’s certainly not the only one. I also experimented lightly with a similar tool called half way through the journey, which is also in beta; Google has announced another one, called Imagen, but has yet to let outsiders try it. A third tool, DALL-E Mini, has generated a series of viral images in recent days; however, it bears no relation to OpenAI or DALL-E, and I imagine the developer will be hit with a shutdown letter soon.

OpenAI told me it hasn’t made any decisions yet about whether or how DALL-E might ever become more widely available. The purpose of the current research beta is to show that people are using this technology, and adjust both the tool and content policies as needed.

And yet the number of use cases that artists have discovered for DALL-E is already surprising. An artist uses DALL-E to create augmented reality filters for social apps. A chef in Miami uses it to get new ideas for serving his dishes. Ben Thompson wrote a prescient piece on how DALL-E can be used to create extremely cheap environments and objects in the metaverse

It’s normal and appropriate to worry about what this kind of automation could do for professional illustrators. Many jobs may well be lost. And yet I can’t help but think that tools like DALL-E could be useful in their workflows. For example, what if they asked DALL-E to draw out a few concepts for them before getting started? The tool allows you to create variations of any image; I used it to suggest an alternative platform game logos:

I’ll stick with the logo I have. But if I were an illustrator I might appreciate the alternative suggestions, if only for inspiration.

It’s also worth considering the creative potential of these tools for people who would never think (or could afford) hire an illustrator. As a child I wrote my own comic books, but my illustration skills never got very far. What if I could have instructed DALL-E to draw all my superheroes for me instead?

On the one hand, this doesn’t seem like the kind of tool most people would use every day. And yet I imagine that in the coming months and years we will find increasingly creative uses of this kind of technology: in e-commerce, in social apps, at home and at work. For artists, it seems to be one of the most powerful culture remixing tools we’ve ever seen – assuming the copyright issues are resolved. (It’s not entirely clear whether using AI to generate images of protected works is considered fair use or not, I’m told. If you like DALL-E’s take on “Batman eating a sandwich,” DM me .)

I suspect that we will also see some malicious uses of this tool. While I trust OpenAI to enforce strong policies against the misuse of DALL-E, there will certainly be similar tools that take more of a do-it-yourself approach to content moderation. People are already creating malicious, often pornographic deepfakes to harass their exes using the raw tools available today; that the technology will only get better.

It’s often the case that when a new technology emerges, we focus on its happier and more erratic use, only to ignore how it could be misused in the future. As excited as I’ve been about using DALL-E, I’m also quite concerned about what similar tools could do in the hands of less scrupulous companies.

It is also worth thinking about what even positive applications of this technology on a large scale can do. If most of the images we encounter online are created by AI, what does that do to our sense of reality? How will we know what all we see is real?

For now, DALL-E feels like a breakthrough in the history of consumer technology. The question is whether in a few years we will see it as the start of a creative revolution, or something more worrying. The future is already here and it’s adding 1,000 users per week. Now is the time to discuss its implications before the rest of the world gets their hands on it.


LEAVE A REPLY

Please enter your comment!
Please enter your name here