NVIDIA’s AI Creates Realistic Photos Based Only on Text Descriptions

Two examples of text description are used to create images using an AI model

NVIDIA’s GauGAN2 AI can now use simple typed statements to create an appropriate photorealistic image. The deep learning model is able to formulate different scenes in just three or four words.

GauGAN is NVIDIA’s AI program that was used to turn simple doodles into realistic masterpieces in 2019, a technology that eventually ported to the NVIDIA Canvas app earlier this year. Now NVIDIA has advanced AI even further as you only need a short description to create an ‘image’.

NVIDIA says the deep learning model behind GauGAH allows anyone to make beautiful scenes, and now it’s easier than ever. Users can simply type in a phrase like “sunset on the beach” and the AI ​​will create the scene in real time where each word is added. Add an adjective like “sunset on a rocky beach” or swap “sunset” for “afternoon” or “rainy day” and the model will adjust the image based on what are called Generative Adversarial Networks (GANs).

“With the push of a button, users can create a segmentation map, which is a high-level outline that shows the location of objects in the scene,” says NVIDIA. “From there, they can switch to drawing, modifying the scene with rough sketches with stickers like sky, tree, rocks, and river, allowing the smart paintbrush to combine those doodles into stunning images.”

Quiet lake surrounded by tall trees on a foggy day.
AI-generated image created with the phrase “A calm lake surrounded by tall trees on a foggy day”.

NVIDIA says the demo is one of the first to combine multiple methods into a single GAN. GauGan2 combines segmentation mapping, interior painting, and text-to-image generation into a single form that NVIDA says makes it a powerful tool to allow users to create realistic art with a mixture of words and graphics. The goal is to make it faster and easier to convert the artist’s vision into a high-quality image generated by artificial intelligence. NVIDIA says that compared to other recent models specifically for text-to-image or map-to-image splitting applications, the GauGAN2 produces a greater variety of high-quality images.

“Instead of having to extract every element of the imagined scene, users can enter a short phrase to quickly generate key features and the subject of the image, such as a snow-capped mountain range,” says NVIDIA. “This starting point can then be customized with graphics to make a particular mountain taller or add two trees in the foreground, or clouds in the sky.”

Tropical island overlooking a white sandy beach from above
AI-generated image created with the phrase “tropical island on white sandy beach from above”.

While photorealistic creation is perhaps the most impressive, GauGAN2 is not limited to this type of recreation. Artists can also use the demo to depict fantasy landscapes from another world. NVIDIA shows a scene that recreates something similar to Star Wars’ fictional planet Tatooine, where the desert scene is initially created by the model but a second sun is added afterwards.

Endless towering mountains on a sunny day
AI-generated image created with the phrase “Endless towering mountains on a sunny day”.

“It’s an iterative process, where each word the user types into the text box adds more to the AI-generated image.”

The text-to-image feature can be tested on the NVVIDIA AI Demos where anyone can try to create custom scenes with text prompts and tweak them further with quick graphics to create more accurate results.

Leave a Comment