"Draw me a mountain": an AI turns your requests into realistic photos

"Draw me a mountain": an AI turns your requests into realistic photos

Nvidia has updated its GauGAN tool, which is now able to generate realistic photos by typing text. All based on machine learning and artificial intelligence.

You don't even need to scribble anything anymore to generate very realistic landscapes thanks to artificial intelligence: now, a few words are enough to produce natural views, such as a shoreline, mountains or many mountains and valleys. This performance is made possible thanks to new advances in AI at Nvidia, with its GauGAN tool.

GauGAN? It is a name that is obviously a nod to the post-impressionist painter Paul Gauguin. But it is above all a way of recalling the functioning of its tool, because GAN is the acronym for generative adversarial network, generative adversarial network in English (GAN). It is an unsupervised learning method devised by computer scientist Ian Goodfellow.

The idea is to call on two GANs to cooperate in order to achieve a certain result. The first generates the visuals, while the second, called the "discriminator", is responsible for evaluating them. The “discriminator” has been trained with deep learning — a technique that involves feeding the AI ​​with prior data. He therefore “knows” what the visuals should look like.

Type some text, get an image

That's what Nvidia iterated on, so they could build in text support. This is what the American company is developing in a news item published on November 22, referring to GauGAN 2. With this tool, which Internet users can test on a dedicated site, it is possible to generate a landscape by describing it in words. and, if necessary, complete it with scribbles.

"With the versatility of text prompts and sketches, GauGAN2 allows users to create and customize scenes faster and with finer control," Nvidia says, noting that its demo is "one of first to combine multiple modalities — text, semantic segmentation, sketch, and style — into a single GAN framework. »

The demonstration video is obviously very spectacular: as you type and order the words between them, the photorealistic visual changes to transcribe the request. In fact, the tool, when tested, does not work in real time: you have to click on a button, once your sentence has been entered (in English, but the site also seems to understand French), to see the result.

“The GauGAN 2 AI model was trained on 10 million high-quality landscape images using the NvidiaSelene supercomputer, an Nvidia DGX SuperPOD system that is among the top 10 most powerful supercomputers in the world “, points Nvidia. The site specifies that the neural network has also learned the connection between the words and the images to which they correspond, such as "winter", "foggy" or "rainbow".

If we leave the landscapes, GauGAN 2 appears lost and its interpretation of a written text becomes random — but this can sometimes give fantastic or dreamlike visuals. We wanted him to draw a sheep, but the network doesn't seem to know what it is. However, it would be enough to train it by showing one of the two GANs millions of pictures of sheep.

Source: Nvidia

Nvidia's work in the field of artificial intelligence has already led it to create more than pretty landscapes. The company has demonstrated faces that are particularly realistic, but which do not exist. Nvidia even virtually cloned its CEO at a conference in August 2021, mobilizing significant technical resources.

These works, which are very spectacular, open up perspectives that are both exciting and worrying. The solutions outlined by Nvidia with GauGAN could have obvious outlets in video games, cinema, animation or series, in association with the work of designers. But we can also imagine unpleasant uses, whether for misinformation or with deep fakes.

All about nvidia

Tags: