OpenAI Introduces Images in ChatGPT Powered by GPT-4o

Readable AI: OpenAI’s Image Generator Overcomes Text Hurdles

OpenAI introduced the innovative “Images in ChatGPT” feature, which allows image generation to become a core part of the ChatGPT platform. The new GPT-4o model powers this innovation, which enables users to generate images within their conversations effortlessly, creating a major advancement in AI-generated content.

“Images in ChatGPT” provides advanced image generation capabilities to all ChatGPT users across tiers, including Plus, Pro, Team, and the free version. OpenAI spokesperson Taya Christianson explained that free tier users face similar image generation limits to DALL-E 3 at three images per day, but these limits can change depending on demand. The company has confirmed that DALL-E fans will have continued access through a specialized GPT system.

OpenAI’s research lead Gabriel Goh called GPT-4o an “omnimodal” foundation because it can process multiple data types such as text, images, audio, and video. The model’s enhanced “binding” capability helps solve a frequent problem encountered during AI image generation. GPT-4o successfully maintains relationships for 15 to 20 objects without blending colors and shapes while previous models frequently failed in this task.

The most significant improvement in the model is its enhanced ability to render text. AI-generated image outputs traditionally contain text that appears scrambled or meaningless. Goh described the development as an iterative process that required many months to perfect. Despite recognizing limitations in perfect text rendering for small elements, the team established a sufficient consistency level that ensures text in images remains functional.

The system’s architecture differs from typical image generators by utilizing an autoregressive model. The technique, which produces images by sequentially moving from left to right and top to bottom like text generation methods, appears to enhance text rendering and binding functions.

The OpenAI briefing demonstrated the system’s wide-ranging capabilities through examples like precise scientific diagrams of Newton’s prism experiment and multi-panel comics with consistent characters and dialogue alongside informational posters with accurate text. Practical applications like creating transparent background images for stickers and logos, and designing restaurant menus, were shown during demonstrations.

ChatGPT’s multimodal product lead, Jackie Shannon, highlighted the system’s capability to utilize world knowledge. Drawing an image means working within my personal skill range, yet utilizing everything I know from my accumulated world knowledge, she explained. Using world knowledge as part of its features enables the model to generate images of Newton’s prism experiment without requiring further explanation from users.

OpenAI acknowledges that although image generation currently takes longer than before their improvements, they believe the superior quality and enhanced capabilities make the wait worthwhile. Shannon acknowledged that latency improvements are possible but maintained that the enhanced image quality and additional features more than compensate for the waiting time.

OpenAI communicated its dedication to strong protective measures when responding to concerns about potential misuse. The system includes features to block CSAM requests while stopping sexual deepfake creation and preventing watermark removal. Generated images will carry standard C2PA metadata, which marks them as OpenAI products despite lacking visual watermarks. The company continues to operate internal tools that verify images.

Shannon explained that while no system is flawless for this purpose, we are committed to enhancing our security measures and view this as our baseline approach. Users who produce images through ChatGPT retain ownership rights and can utilize these images according to our established usage policies.

The addition of “Images in ChatGPT” allows OpenAI to expand its leading product’s capabilities while transforming AI into a medium for creative visual expression within the conversational platform.

Readable AI: OpenAI’s Image Generator Overcomes Text Hurdles

Recent Posts

Google Ads

Hot Categories

Business

Education

Entertainment

Events

Investing

News

Sports

Technology

Tag