How did you score in the last round? Four hot trends to watch in 2024 included something called customized chatbots. It’s an interactive helper app that leverages multimodal large-scale language models (check: we didn’t know it yet, but now we’re talking about the hottest thing everyone calls an agent). (Check: Few technologies have advanced as rapidly in the past 12 months. OpenAI and Google DeepMind announced their flagship video generation model, Sora, in December of this year. Veo to each other 1 released within a week). And there are a growing number of general-purpose robots capable of performing a wider range of tasks (check it out). The benefits from large-scale language models continue to trickle down to other parts of the technology industry, and robotics is at the top of that list).
We also said that AI-generated election disinformation would be everywhere. Fortunately, we were wrong here. There was a lot to wring our hands from this year, but very few political deepfakes.
So what will happen in 2025? Let’s ignore the obvious here. There is no doubt that agents and smaller, more efficient language models will continue to shape the industry. Instead, here are five alternatives chosen by our AI team.
1. Generative virtual playground
If 2023 is the year of generative images and 2024 is the year of generative video, what happens next? High-five everyone if you guessed generative virtual worlds (aka video games).
Google DeepMind gave us a small glimpse of this technology back in February. That’s when Google DeepMind unveiled a generative model called Genie that can take static images and turn them into side-scrolling 2D platform games that players can interact with. In December, the company announced Genie 2, a model that allows you to incorporate a starter image into an entire virtual world.
Other companies are building similar technology. In October, AI startups Decart and Etched revealed an unofficial Minecraft hack in which every frame of the game is generated on the fly as you play. And World Labs, a startup co-founded by Fei-Fei Li, creator of ImageNet, the vast dataset of photos that started the deep learning boom, is building what it calls large-scale world models (LWMs). Building.
One obvious application is video games. These early experiments have a playful feel, using generative 3D simulations to explore new game design concepts and turn sketches into playable environments on the fly. This could lead to entirely new types of games.
But it could also be used to train robots. World Labs wants to develop so-called spatial intelligence, the ability of machines to interpret and interact with the everyday world. But robotics researchers lack sufficient data on real-world scenarios to train such technology. Setting up countless virtual worlds, dropping virtual robots into them, and having them learn through trial and error may help compensate for this.