NVIDIA’s new AI eats words, spits out photos and feels borderline magical

NVIDIA has an art creation system that uses artificial intelligence to turn words into visually stunning works of art. This is not the first time that this type of concept has been hypothesized or even produced. However, this is the first time we have seen such a system operate with such amazing speed and accuracy.

Peek at OpenAI to see a project called DALL E. This is an image generation project based on GPT-3, which you can learn about at Cornell University. You can start making wild interpretations of patterns with the Deep Dream Generator, or learn about some of the resources from the NVIDIA Research project we’re looking at today – see Generative Adversarial Networks article Getting to know the GAN!

The NVIDIA Project GauGAN2 builds on what company researchers have created with NVIDIA Canvas. This application works – in experimental mode at the moment – with the first GauGAN model. With artificial intelligence at your disposal, anyone can create relatively realistic looking artwork with no more input than is required for finger painting.

With GauGAN2, NVIDIA researchers have expanded what is possible with simple input and artificial intelligence interpretation of said inputs. This template uses a variety of graphics (nearly 10 million high-quality landscape images), as a bank of visual knowledge, and relies on said bank in order to determine what your words can mean in a work of art.

The single GAN framework in GauGAN2 includes several approaches. NVIDIA refers to text, semantic segmentation, graphic, and style. Below you will see a demonstration of this new text input element in an interface that is primarily an extension of NVIDIA Canvas.

The demonstration is much less important than it represents. The smartphone can now magically erase the items in the photo. If you use a system like Google Photos, AI is already in your life, and it gets smarter the more photos you take with your phone.

Next Wave Here, with NVIDIA’s demo, shows us how a machine doesn’t just know how to identify elements in images, it knows how to create images based on its knowledge of the images fed. NVIDIA has a model here that effectively shows us that graphics processing power and the right set of code can generate shockingly reliable representations of what we humans interpret as reality.

Leave a Comment