Well, it’s World Cup season now, isn’t it? Australia got eliminated this week so I’m feeling a bit depressed at the moment. However, seeing teams like Germany also not make it past the group stage makes me feel a little better (sorry German people reading this post :P).
But since the World Cup is on, it is only fitting that I write about something from the field of Computer Vision that is related to football. So, in this post I’m going to present to you quite an amazing paper I stumbled upon entitled “Soccer on Your Tabletop” (Rematas et al., CVPR 2018, pp. 4738-4747)
(Recall that CVPR is a world-class academic conference on computer vision. Anything published there is always worth reading.)
The goal of the paper is to present an algorithm that reconstructs a 3D representation of a football game from a single 2D video – just like you would find on YouTube. The 3D video could then be projected onto a tabletop (like a hologram) and viewed by everyone in the room from multiple angles. An interesting concept!
Usually something like this is obtained by having multiple cameras set up that can then work together to provide 3D information of the football pitch. But the idea here is to get all information from a single 2D video.
Here’s a clip of the entire project that has been released by the authors. Watch it!
Ah, you just have to love computer vision!
Let’s take a look at the (slightly simplified here) steps involved in the 2D -> 3D reconstruction process:
- Input frame: obtained from any 2D video of a football game (captured from stationary cameras).
- Camera calibration: this is performed using the football pitch line markings as guidance. The line markings provide excellent reference points to obtain the 2D plane of the football pitch from which players’ measurements can be deduced.
- Player detection, pose estimation, and tracking: this is done using already existing techniques. Specifically this paper is referenced from CVPR 2015 for detecting bounding boxes around players (top left image below), this paper from CVPR 2016 for estimation poses (top right image below), and a simple player tracking algorithm where you compare bounding boxes from adjacent frames and match them according to closest 2D Euclidean distance (bottom left image).
- Player segmentation: the idea here is to highlight the entire contour of the player after performing the above steps (see bottom right image above). This is performed by taking each pixel and analysing its neighbouring pixels for similarities in colour and edge information until each player is extracted. (Several more steps are performed to fine-tune this process but I’ll skip over these).
- Player depth estimation and mesh generation. This is the tricky part. What the authors did is quite intuitive. To constrain the solution space to just football related poses, body shapes, and clothing, the authors created a training dataset from FIFA video games. Lol! What they found was that it was possible to intercept calls between the game engine and the GPU while playing the video game and then to extract depth maps from these intercepted calls. In doing so, they were able to train a deep neural network to extract depth maps from 2D videos. This trained network was then used on 2D YouTube videos. Absolutely brilliant!
- Scene reconstruction. Once player depth estimation and mesh information (which is 3D information) is obtained, the scene can then be reconstructed. What the authors ended up doing is to use Microsoft HoloLens (a mixed reality lens that enables you to see and interact with holograms in real life). So the football pitch on the tabletop you see in the image below isn’t real! Can you imagine watching a match like this around a table with your mates!? There is a catch with the project, however. It’s not good enough yet to reconstruct the ball, which means that at the moment all you can view in 3D are players running around chasing an invisible object 🙂 But that’s work in progress and the job and essence of research.
Amazing, if you ask me! I can’t wait to see what the future holds for computer vision.
And believe it or not, code for this project is available online for you to play around with as much as you like. So, enjoy!
To be informed when new content like this is posted, subscribe to the mailing list: