I was rewatching “Bourne Identity” the other day. Love that flick! Heck, the scene at the end is one of my favourites. Jason Bourne grabs a dead guy, jumps off the top floor landing, and while falling shoots a guy square in the middle of the forehead. He then breaks his fall on the dead body he took down with him. That has to be one of the best scenes of all time in the action genre.
But there’s one scene in the film that always causes me to throw up a little in my mouth. It’s the old “Just enhance it!” scene (minute 31 of the movie) and something we see so often in cinemas: people scanning security footage and zooming in on a face; when the image becomes blurry they request for the blur to dissipate. The IT guy waves his wand and presto!, we see a full resolution image on the screen. No one stands a chance against magic like that.
But why is enhancing images as shown in movies so ridiculous? Because you are requesting the computer to create new information for the extra pixels that you are generating. Let’s say you zoom in on a 4×4 region of pixels and want to perform facial recognition on it. You then request for this region to enhance. This means you are requesting more resolution, say 640×480. How on earth is the computer supposed to infer what the additional 307,184 pixels are to contain?
The other side to the story
However! Something happened at work that made me realise that the common “Enhance” scenario may not be as far-fetched as one would initially think. A client came to us a few weeks ago requesting that we perform some detailed video analytics of their security footage. They had terabytes of the stuff – but, as is so often the case, the sample video provided to us wasn’t of the best quality. So, we wrote back to the client stating the dilemma and requested that they send us better quality footage. We haven’t heard back from them yet, but you know what? It’s well possible that they will provide us with what we need!
You see, they compressed the video footage in order for it to be sent over the Internet quickly. And here is where the weak link surfaces: transferring of data. If they could have sent the full uncompressed video easily, they would have.
Quality vs transmission restraints
So, back to Hollywood. Let’s say your security footage is recording at some mega resolution. NASA has released images from its Hubble Space Telescope at resolutions of up to 18,000 x 18,000. That’s astronomical! (apologies for the pun). At that resolution, each image is a whopping 400MB (rounded up) in size. This, however, means that you can keep zooming in on their images until the cows come home. Try it out. It’s amazing.
But let’s assume the CIA, those bad guys chasing Bourne, have similar means at their disposal (I mean, who knows what those people are capable of, right!?). Now, let’s say their cameras have a frame rate of 30 frames/sec, which is relatively poor for the CIA. That means that for each second of video you need 12GB of storage space. A full day of recording would require you to have 1 petabyte of space. And that’s just footage from one camera!
It’s possible to store video footage of that size – Google cloud storage capacities are through the roof. But, the bottleneck is the transferring of such data. Imagine if half a building was trying to trawl through security footage in its original form from across the other side of the globe.
The possible scenario
See where I’m going with this? Here is a possible scenario: initially, security footage is sent across the network in compressed form. People scan this footage and then when they see something interesting, they zoom in and request the higher resolution form of the zoomed in region. The IT guy presses a few keys, waits 3 seconds, and the image on the screen is refreshed with NASA quality resolution.
Boom!
Of course, additional infrastructure would be necessary to deal with various video resolutions but that is no biggie. In fact, we see this idea being utilised in a product all of us use on a daily basis: Google Maps. Each time you zoom in, the image is blurry and you need to wait for more pixels to be downloaded. But initially, low resolution images are transferred to your device to save on bandwidth.
So, is that what’s been happening all these years in our films? No way. Hollywood isn’t that smart. The CIA might be, though. (If not, and they’re reading this: Yes, I will consider being hired by you – get your people to contact my people).
Summary
The old “enhance image” scene from movies may be annoying as hell. But it may not be as far-fetched as things may initially seem. Compressed forms of videos could be sent initially to save on bandwidth. Then, when more resolution is needed, a request can be sent for better quality.
To be informed when new content like this is posted, subscribe to the mailing list (or subscribe to my YouTube channel!):
Thanks for that inspirational article!
In case if CIA are watching us, I was just thinking how to help them to optimise image storage for particular needs. What do you think, would it be the case, if you want to process only face recognition, you might store only a piece of image in high-resolution (which contains faces), leaving everything else in a low quality. In this case you can hardly reduce storage space. Say, a face covers only 5% of an area (and not every second), which will reduce it up to 50 times.
This and a couple of more techniques, and you can catch this Bourne, a badass (just one tiny assumption – if you know where he should be 🙂 ).
Good question!
Yes, that would definitely reduce the space required for each frame but at the cost of additional computation. Because for each frame you would need to run a face detection algorithm. These aren’t computationally expensive but they will still require additional infrastructure (clusters of machines – and lots of them if you want to process a lot of CCTV footage in real time).
In the end, then, it’ll be trade-off between saving memory at the expense of computational power or vice-versa.
But then there’s the question of whether you will want to get high resolution images of other objects like licence plates, contents of bags, what people are holding, etc.? It suddenly becomes computationally infeasible to detect all these things in real-time. Also, you just don’t know what you will need to look for in the future, either, so is it a good idea to discard data to save memory like this?
If memory storage isn’t a problem for you (it’s so cheap nowadays!), saving space might not be a good idea. Storage is definitely not a problem for the CIA 🙂
Zig