There was a paper recently where a research team trained a machine learning algorithm (a GAN they called AttnGAN) to generate pictures based on written descriptions. It’s like Visual Chatbot in reverse.

When it was just trained to generate pictures of birds, it did pretty well, actually.  (Although the description didn’t specify a beak and so it just… left it out.) But when they trained the same algorithm on a huge and highly varied dataset, it had a lot more trouble generating a picture to go with that caption. Below, I give the same caption to a version of their algorithm that has been trained to generate everything from sheep to shopping centers.

Cris Valenzuela wrapped their trained model in an entertaining demo that attempts to generate a picture for any caption. This bird is less, um, recognizable.

When the GAN has to draw *anything* I ask for, there’s just too much to keep track of – the problem’s too broad, and the algorithm spreads itself too thin. It doesn’t just have trouble with birds.

A GAN that’s been trained just on celebrity faces will tend to produce photorealistic portraits. But this one, however… In fact, it does a horrifying job with humans because it can never quite seem to get the number of orifices correct. Read more from…

thumbnail courtesy of