Introduction

Imagine being able to conjure up images directly within ChatGPT, no more switching between apps or wrestling with separate image generators. Well, stop imagining! OpenAI is making it a reality with the integration of new image generation capabilities powered by the GPT-4o model, directly into ChatGPT. This feature, aptly named “Images in ChatGPT,” is rolling out now and promises a significant leap in image quality and accuracy.

Think of it like this: you're a chef, and ChatGPT is your sous chef. Before, you had to describe the dish and then go to a separate station to plate it (the image). Now, your sous chef can handle the plating too, and they've gotten really good at it.

This upgrade is available across all ChatGPT subscription tiers: Plus, Pro, Team, and even the Free tier. So, whether you're a casual user or a power user, you can start creating images right away.

What's New and Improved?

So, what makes this new image generation tool so special? According to Gabriel Goh, research lead at OpenAI, this model represents a "step change above previous models." The team leveraged the GPT-4o's "omnimodal" foundation, meaning it can handle various data types like text, images, audio, and video, to create this feature.

Here are some key improvements you can expect:

Enhanced Binding

One of the most significant improvements is in binding. Binding refers to how well the AI maintains the correct relationships between attributes and objects in an image. Previous image generators often struggled with this, mixing up colors, shapes, and other attributes when asked to render multiple items.

For example, if you asked for a blue star and a red triangle, older models might give you a red star and no triangle. This new image generation tool can accurately bind attributes for 15 to 20 objects without confusion. That's a huge leap in accuracy and reliability!

Think of it like this: imagine you're asking a friend to draw a picture. Before, they might get the details mixed up. Now, they're like a super-attentive artist who gets every detail right.

Superior Text Rendering

Another major improvement is in text rendering. If you've ever tried to generate images with text using AI, you know how frustrating it can be to deal with typos and garbled characters. This new system makes it much easier to generate coherent text without errors.

Goh explained that getting text rendering right was a significant challenge. Even small typos or errors can make an entire image unusable. The team spent months iterating and refining the system to reach a point where the text quality is consistently usable.

While it's not perfect (especially with very small text), the improvement is noticeable. This means you can now create images with clear, readable text for posters, logos, and other applications.

Autoregressive Approach

So, what's the secret sauce behind these improvements? The system uses an autoregressive approach to generate images. This means it creates images sequentially, from left to right and top to bottom, similar to how text is written. This is different from the diffusion model technique used by most image generators (like DALL-E), which create the entire image at once.

Goh speculates that this technical difference could be what gives Images in ChatGPT its better text rendering and binding capabilities. It's like the difference between painting a picture stroke by stroke versus spraying it all at once. The former allows for more precision and control.

Real-World Applications

The OpenAI team demonstrated several examples showcasing the system's capabilities, including:

Scientific diagrams: Imagine creating a diagram of Newton's prism experiment with correctly labeled components, all generated within ChatGPT.
Multi-panel comics: Create comics with consistent characters and text bubbles, without having to worry about the AI getting the details wrong.
Informational posters: Generate posters with accurate text and visuals for presentations, marketing materials, or educational purposes.
Transparent background images: Easily create images with transparent backgrounds for stickers, logos, and other design projects.
Restaurant Menus: Quickly generate visually appealing menus with accurate descriptions and pricing.

Jackie Shannon, ChatGPT multimodal product lead, highlighted the system's ability to leverage world knowledge. "If I go to draw an image, I do so with the limitation of my own skill... but also with all of the knowledge of the world that I’ve built up," she explained. "The model brings world knowledge to the equation, so when you ask for an image of Newton’s prism experiment, you don’t have to explain what that is to get an image back."

A Tradeoff: Latency

There is one tradeoff to consider: the new system takes longer to generate images than before. However, OpenAI suggests that the improved quality and capabilities are worth the wait.

"While we certainly have room to improve on latency…the quality of these images, the capability, the world knowledge, really makes up for the additional seconds that they’ll spend waiting," Shannon said.

Safeguards and Ownership

Of course, with any powerful technology, there are concerns about potential misuse. The OpenAI team emphasized that the system includes robust safeguards to prevent misuse, such as:

Preventing watermark removal
Blocking the generation of sexual deepfakes
Refusing CSAM generation requests

While the new system doesn't include visual watermarks, all generated images will include standard C2PA metadata to mark them as having been created by OpenAI. The company will also have internal tooling to look up images.

"Ultimately, no system is perfect for this type of thing, but we’re continuously improving our safeguards and we think of this as a starting point," Shannon added. "One thing that’s true about all of the images generated from ChatGPT is that the user owns them and are free to use them within the bounds of our usage policies as they would like."

What About DALL-E?

For those who are fans of DALL-E, don't worry! Christianson said that users will "still have access via a custom GPT."

GPT-4o Powers New Image Generation in ChatGPT: Say Goodbye to Garbled Text!