The recent reveal of OpenAI’s Sora model which generates videos from text made headlines around the world. And understandably so, because it’s truly something amazing.
But I was not too surprised with the announcement. I wrote about the emergence of text-to-video generative AI on my blog 16 months ago! See here: AI Video Generation (Text-To-Video Translation). So, I knew that it was just a matter of time before one of the big players released something of such beautiful calibre.
What did surprise me, however, was something that seemingly went under the radar just 2 weeks ago: an announcement from Google’s DeepMind research team of an AI model that generates video games from single example images. The original academic paper, entitled “Genie: Generative Interactive Environments” was published 23 February 2024.
With Genie, Google is coining a new term: “generative interactive environments (Genie), whereby interactive, playable environments can be generated from a single image prompt”.
What does this mean? Simple: you provide Genie with an example image (hand drawn, if you want) and you can then play a 2D platformer game set inside the environment that you created.
Here are some examples. The first image is a human-drawn sketch, the following image is a short video showing somebody playing a video game inside the world depicted in the first image:
Here’s another one that starts off with a hand-drawn picture:
Real world images (photos) work as well! Once again, the second image is a short snippet of somebody actually moving a character with a controller inside a generated video game.
See Google’s announcement for more great examples.
The title of my post states “Text-to-Video Game Translation”. If the only input permitted is a single image, how does “text-to-video game” fit here? The idea is that text-to-image models/generators like DALL-E or Stable Diffusion could be used to convert your initial text prompt into an image, and then that image could be fed into Genie.
Very cool.
Video Game Quality
Now, the generated video game quality isn’t perfect. It certainly leaves a lot to be desired. Also, you can only play the game at 1 frame per second (FPS). Typically games run at 30-60 FPS, so seeing the screen change only once per second is no fun. However, the game is being generated on-the-fly, as you play it. So, if you press one of 8 possible buttons on a gamepad, the next frame will be a freshly generated response to your chosen action.
Still, it’s not super exciting. But just like with my first post on text-to-video generative AI that introduced the whole idea of videos generated by AI, I’m doing the same thing now. This is what is currently being worked on. So, there might be more exciting stuff coming just around the corner – in 16 months perhaps? For example this: “We focus on videos of 2D platformer games and robotics but our method is general and should work for any type of domain, and is scalable to ever larger Internet datasets.” (quoted from here)
There’s more coming. You heard it here first!
Other Works
For full disclosure, I need to mention that this isn’t the first time people have dabbled in text-to-video game generation. Nvidia, for example, released GameGAN in 2020, which could produce clones of games like Pac-Man.
The difference with Google’s model is that it was entirely trained in an unsupervised manner from unlabelled internet videos. So, Genie learned just from videos what elements on the screen were being controlled by a player, what the corresponding controls were, and which elements were simply part of the scrolling background. Nvidia, on the other hand, used as training material video input paired with descriptions of actions taken. Creating a labelled dataset of actions paired with video results is a laborious process. Like I said, Google did their training raw: on 30,000 hours of just internet videos of hundreds of 2D platform games.
To be informed when new content like this is posted, subscribe to the mailing list:
(Note: If this post is found on a site other than zbigatron.com, a bot has stolen it – it’s been happening a lot lately)