Nokia-PureView9-Back

Smartphone Camera Technology from Google and Nokia

A few days ago Nokia unveiled its new smartphone: the Nokia 9 PureView. It looks kind of weird (or maybe funky?) with its 5 cameras at its rear (see image above). But what’s interesting is how Nokia uses these 5 cameras to give you better quality photos with a technique called High Dynamic Range (HDR) imaging.

HDR has been around in smartphones for a while, though. In fact, Google has had this imaging technique available in some of its phones since at least 2014. And in my opinion it does a much better job than Nokia. 

In this post I would like to discuss what HDR is and then present what Nokia and Google are doing with it to provide some truly amazing results. I will break the post up into the following sections:

  • High Dynamic Range Imaging (what it is)
  • The Nokie 9 PureView
  • Google’s HDR+ (some amazing results here)

High Dynamic Range Imaging

I’m sure you’ve attempted to take photos of high luminosity range scenarios such as dimly lit scenes or ones where the backdrop is brightly radiant. Frequently such photos come out either overexposed, underexposed and/or blurred. The foreground, for example, might be completely in shadow or details will be blurred out because it’s hard to keep the camera still when you have the shutter speed set to low to let in extra light.

HDR attempts to alleviate these high range scenario problems by capturing additional shots of the same scene (at different exposure levels, for instance) and then taking what’s best out of each photo and merging this into one picture.

Photo by Gustave Le Gray (image taken from Wikipedia)

Interestingly, the idea of taking multiple shots of a scene to provide a better single photo goes back to the 1850s. Gustave Le Gray, a highly noted French photographer, rendered seascapes showing both the sky and the sea by using one negative for the sky, and another one with a longer exposure for the sea. He then combined the two into one picture in the positive. Quite innovative for the period. The picture on the right was captured by him using the HDR technique.

 

The Nokia 9 PureView

As you’ve probably already guessed, Nokia uses the five cameras on the Nokia 9 PureView to take photos of the same scene. However, each camera is different. Two cameras are standard RGB sensors to capture colour. The remaining three are monochrome that capture nearly three times more light as the RGB cameras. These 5 cameras are each 12 megapixels in resolution. There is also an infrared sensor for depth readings.

Depending on the scene and lighting conditions each camera can be triggered up to four times in quick succession (commonly referred to as burst photography).

One colour photo is then selected to act as the primary shot and the other photos are used to improve it with details.

The final result is a photo of up to 240 megapixel in quality. Interestingly, you also have control over how much photo merging takes place and where this merging occurs. For example, you can choose to add additional detail to the foreground and ignore the background. The depth map from the depth sensor undoubtedly assists in this. And yes, you have access to all the RAW files taken by the cameras.

Not bad, but in my opinion Google does a much better job… and with only one camera. Read on!

Google’s HDR+

Google’s HDR technology is dubbed HDR+. It has been around for a while, first appearing in the Nexus 5 and 6 phones. It is now a standard on the Pixel range of phones. It is standard because HDR+ uses the regular single camera on Google’s phones. 

It gets away with just using one camera by taking up to 10 photos in quick succession – much more than Nokia does. Although the megapixel quality of the resulting photos may not match Nokia’s, the results are nonetheless impressive. Just take a look at this:

(image taken from here)

That is a dimly lit indoor scene. The final result is truly astonishing, isn’t it?

Here’s another example:

Both pictures were taken with the same camera. The picture on the left was captured with HDR+ turned off while the picture on the right had it turned on. (image taken from here)

What makes HDR+ standout from the crowd is its academic background. This isn’t some black-box technology that we know nothing about – it’s a technology that has been peer-reviewed by world-class academics and published in a world-class conference (SIGGRAPH Asia 2016).

Moreover, only last month, Google released to the public a dataset of archives of image bursts to help improve this technology.

When Google does something, it (usually) does it with a bang. You have to love this. This is HDR imaging done right.

 

To be informed when new content like this is posted, subscribe to the mailing list:

Please share what you just read:
heart-rate

Heart Rate Estimation Using Computer Vision

This post is another that has been inspired by a forum question: “What are some lesser known use cases for computer vision?” I jumped at the opportunity to answer this question because if there’s one thing I’m proud of with respect to this blog, it is the weird and wacky use cases that I have documented here.

Some things I’ve talked about include:

In this post I would like to add to the list above and discuss another lesser known use case for computer vision: heart rate estimation from colour cameras.


Vital Signal Estimation

Heart rate estimation belongs in the field called “Vital Signal Estimation” (VSE). In computer vision, VSE has been around for a while. One of the more famous attempts at this comes from 2012 from a paper entitled “Eulerian Video Magnification for Revealing Subtle Changes in the World” that was published at SIGGRAPH by MIT.

(Note: as I’ve mentioned in the past, SIGGRAPH, which stands for “Special Interest Group on Computer GRAPHics and Interactive Techniques”, is a world-renowned annual conference held for computer graphics researchers. But you do sometimes get papers from the world of computer vision being published there as is the case with this one.)

Basically, the way that VSE was implemented by MIT was that they analysed images captured from a camera for small illumination changes on a person’s face produced by varying amounts of blood flow to it. These changes were then magnified to make it easier to scrutinise. See, for example, this image from their paper:

(image source)

Amazing that these illumination changes can be extracted like this, isn’t it?

This video describes the Eulerian Video Magnification technique developed by these researchers (colour amplification begins at 1:25):

Interestingly, most research in VSE has focused around this idea of magnifying minute changes to estimate heart rates.

Uses of Heart Rate Estimation

What could heart rate estimation by computer vision be used for? Well, medical scenarios automatically come to mind because of the non-invasive (i.e. non-contact) feature of this technique. The video above (from 3:30) suggests using this technology for SIDS detection. Stress detection is also another use case. And what follows from this is lie detection. I’ve already written about lie detection using thermal imaging – here is one more way for us to be monitored unknowingly.

On the topic of being monitored, this paper suggests (using a slightly different technique of magnifying minute changes in blood flow to the face) detecting emotional reactions to TV programs and advertisements.

Ugh! It’s just what we need, isn’t it? More ways of being watched.

Making it Proprietary

From what I can see, VSE in computer vision is still mostly in the research domain. However, this research from Utah State University has recently been patented and a company called Photorithm Inc. formed around it. The company produces baby monitoring systems to detect abnormal breathing in sleeping infants. In fact, Forbes wrote an interesting article about this company this year. Among other things, the article talks about the reasons behind the push by the authors for this research and how the technology behind the application works. It’s a good read.

Here’s a video demonstrating how Photorithm’s product works:

Summary

This post talked about another lesser known use case of computer vision: heart rate estimation. MIT’s famous research from 2012 was briefly presented. This research used a technique of magnifying small changes in an input video for subsequent analysis. Magnifying small changes like this is how most VSE technologies in computer vision work today.

After this, a discussion of what heart rate estimation by computer vision could be used for followed. Finally, it was mentioned that VSE is still predominantly something in the research domain, although one company has recently appeared on the scene that sells baby monitoring systems to detect abnormal breathing in sleeping infants. The product being sold by this company uses computer vision techniques presented in this post.

 

To be informed when new content like this is posted, subscribe to the mailing list:

Please share what you just read:
image-colourisation-example7

Image Colourisation – Converting B&W Photos to Colour

I got another great academic publication to present to you – and this publication also comes with an online interactive website for you to use to your heart’s content. The paper is from the field of image colourisation. 

Image colourisation (or ‘colorization’ for our US readers :P) is the act of taking a black and white photo and converting it to colour. Currently, this is a tedious, manual process usually performed in Photoshop, that can typically take up to a month for a single black and white photo. But the results can be astounding. Just take a look at the following video illustrating this process to give you an idea of how laborious but amazing image colourisation can be:

Up to a month to do that for each image!? That’s a long time, right?




But then came along some researchers from the University of California in Berkeley who decided to throw some deep learning and computer vision at the task. Their work, published at the European Conference on Computer Vision in 2016, has produced a fully automatic image colourisation algorithm that creates vibrant and realistic colourisations in seconds. 

Their results truly are astounding. Here are some examples:

image-colourisation-example2

image-colourisation-example3

Not bad, hey? Remember, this is a fully automatic solution that is only given a black and white photo as input.

How about really old monochrome photographs? Here is one from 1936:

image-colourisation-example5

And here’s an old one of Marilyn Monroe:

image-colourisation-example6

 

Quite remarkable. For more example images, see the official project page (where you can also download the code).

How did the authors manage to get such good results? It’s obvious that deep learning (DL) was used as part of the solution. Why is it obvious? Because DL is ubiquitous nowadays – and considering the difficulty of the task, no other solution is going to come near. Indeed, the authors report that their results are significantly better than previous solutions.

What is intuitive is how they implemented their solution. One might choose to go down the standard route of designing a neural network that maps a black and white image directly to a colour image (see my previous post for an example of this). But this idea will not work here. The reason for it is that similar objects can have very different colours.

Let’s take apples as an example to explain this. Consider an image dataset that has four pictures of an apple: 2 pictures showing a yellow apple and 2 showing a red one. A standard neural network solution that just maps black and white apples to colour apples will calculate the average colour of apples in the dataset and colour the black and white photo this way. So, 2 yellow + 2 red apples will give you an average colour of orange. Hence, all apples will be coloured orange because this is the way the dataset is being interpreted. The authors report that going down this path will produce very desaturated (bland) results.

So, their idea was to instead calculate what the probability is of each pixel being a particular colour. In other words, each pixel in a black and white image has a list of percentages calculated that represent the probability of that particular pixel being each specific colour. That’s a long list of colour percentages for every pixel! The final colour of the pixel is then chosen from the top candidates on this list.

Going back to our apples example, the neural network would tell us that pixels belonging to the apple in the image would have a 50% probability of being yellow and 50% probability of being red (because our dataset consists of only red and yellow apples). We would then choose either of these two colours – orange would never make an appearance.

As is usually the case, ImageNet with its 1.3 million images (cf. this previous blog post that describes ImageNet) is used to train the neural network. Because of the large array of objects in ImageNet, the neural network can hence learn to colour many, many scenes in the amazing way that it does.

What is quite neat is that the authors have also set up a website where you can upload your own black and white photos to be converted by their algorithm into colour. Try it out yourself – especially if you have old photos that you have always wanted to colourise.

Ah, computer vision wins again. What a great area in which to be working and researching.

 

To be informed when new content like this is posted, subscribe to the mailing list:

Please share what you just read:
Reflection-in-the-eye

Extracting Images from Reflections in the Eye

Ever thought about whether you could zoom in on someone’s eye in a photo and analyse the reflection on it? Read on to find out what research has done in this respect!

A previous post of mine discussed the idea of enhancing regions in an image for better clarity much like we often see in Hollywood films. While researching for that post I stumbled upon an absolutely amazing academic publication from 2013.

The publication in question is entitled “Identifiable Images of Bystanders Extracted from Corneal Reflections” (R. Jenkins & C. Kerr, PloS one 8, no. 12, 2013). In the experiments that Jenkins & Kerr performed, passport-style photographs were taken of volunteers while a group of bystanders stood behind the camera watching. The volunteers’ eyes were then zoomed in on and the faces of the onlookers reflected in the eyes were extracted, as shown in the figure below:

Zooming-face
(Image adapted from the original publication)

Freaky stuff, right!? Despite the fact that these reflections comprised only 0.5% of the initial image size, you can quite clearly make out what is reflected in the eye. The experiments that were performed also showed that the bystanders were not only visible but identifiable. Unfortunately, with a small population size for the experiments, this technically makes the results statistically insignificant (the impact factor of the journal in 2016 was 2.8, which speaks for itself) – but who cares?! The coolness factor of what they did is through the roof! Just take a look at a row of faces that they managed to extract from a few reflections captured by the cameras. Remember, these are reflections located on eyeballs:

reflections-eye-row-faces
(Image taken from the original publication)

With respect to interesting uses for this research the authors state the following:

our findings suggest a novel application of high-resolution photography: for crimes in which victims are photographed, corneal image analysis could be useful for identifying perpetrators.

Imagine a hostage taking a photo of their victim and then being recognised from the reflection in the victim’s eye!



But it gets better. When discussing future work, they mention that 3D reconstruction of the reflected scene could be possible if stereo images are combined from reflections from both eyes. This is technically possible (we’re venturing into work I did for my PhD) but you would need much higher resolution and detailed data of the outer shape of a person’s eye because, believe it or not, we each have a differently shaped eyeball.

Is there a catch? Yes, unfortunately so. I’ve purposely left this part to the very end because most people don’t read this far down a page and I didn’t want to spoil the fun for anyone 🙂 But the catch is this: the Hasselblad H2D camera used in this research produces images at super-high resolution: 5,412 x 7,216 pixels. That’s a whopping 39 megapixels! In comparison, the iPhone X camera takes pictures at 12 megapixels. And the Hasselblad camera is ridiculously expensive at US$25,000 for a single unit. However, as the authors state, the “pixel count per dollar for digital cameras has been doubling approximately every twelve months”, which means that sooner or later, if this trend continues, we will be sporting such 39 megapixel cameras on our standard phones. Nice!

Summary

Jenkins and Kerr showed in 2013 that extracting reflections on eyeballs from photographs is not only possible but faces on these reflections can be identifiable. This can prove useful in the future for police trying to capture kidnappers or child sex abusers who frequently take photos of their victims. The only caveat is that for this to work, images need to be of super-high resolution. But considering how our phone cameras are improving at a regular rate, we may not be too far away from the ubiquitousness of such technology. To conclude, Jenkins and Kerr get the Noble Peace Prize for Awesomeness from me for 2013 – hands down winners.

 

To be informed when new content like this is posted, subscribe to the mailing list:

 

Please share what you just read:
Enhancing image meme hollywood

Is image enhancing possible? Yes, in a way…

I was rewatching “Bourne Identity” the other day. Love that flick! Heck, the scene at the end is one of my favourites. Jason Bourne grabs a dead guy, jumps off the top floor landing, and while falling shoots a guy in the middle of the forehead. He then breaks his fall on the dead body he took down with him. That has to be one of the best scenes of all time in the action genre.

But there’s one scene in the film that always causes me to throw up a little in my mouth. It’s the old “Just enhance it!” scene (minute 31 of the movie) and something we see so often in cinemas: people scanning security footage and zooming in on a face. When the image becomes blurry they request for the blur to dissipate. The IT guy waves his wand and presto!, we see a full resolution image on the screen. No one stands a chance against magic like that.

But why is enhancing images as shown in movies so ridiculous? Because you are requesting the computer to create new information for the extra pixels that you are generating. Let’s say you zoom in on a 4×4 region of pixels and want to perform facial recognition on it. You then request for this region to enhance. This means you are requesting more resolution, say 640×480. How on earth is the computer supposed to infer what the additional 307,184 pixels are to contain?

Enhancing image example

The other side to the story

However!!! Something happened at work that made me realise that the common “Enhance” scenario may not be as far-fetched as one would initially think. A client came to us a few weeks ago requesting that we perform some detailed video analytics of their security footage. They had terabytes of the stuff – but, as is so often the case, the sample video provided to us wasn’t of the best quality. So, we wrote back to the client stating the dilemma and requested that they send us better quality footage. We haven’t heard back from them yet, but you know what? It’s well possible that they will provide us with what we need!

You see, they compressed the video footage in order for it to be sent over the Internet quickly. And here is where the weak link surfaces: transferring of data. If they could have sent the full uncompressed video easily, they would have.



Quality vs transmission restraints

So, back to Hollywood. Let’s say your security footage is recording at some mega resolution. NASA has released images from its Hubble Space Telescope at resolutions of up to 18,000 x 18,000. That’s astronomical!! (apologies for the pun). At that resolution, each image is a whopping 400MB (rounded up) in size. This, however, means that you can keep zooming in on their images until the cows come home. Try it out! It’s amazing.

But let’s assume the CIA, those bad guys chasing Bourne, have similar means at their disposal (I mean, who knows what those people are capable of, right!?). Now, let’s say their cameras have a frame rate of 30 frames/sec, which is relatively poor for the CIA. That means that for each second of video you need 12GB of storage space. A full day of recording would require you to have 1 petabyte of space. And that’s just footage from one camera!

It’s possible to store video footage of that size – Google cloud storage capacities are through the roof. But, the bottleneck is the transferring of such data. Imagine if half a building was trying to trawl through security footage in its original form from across the other side of the globe.

The possible scenario

See where I’m going with this? Here is a possible scenario: initially, security footage is sent across the network in compressed form. People scan this footage and then when they see something interesting, they zoom in and request the higher resolution form of the zoomed in region. The IT guy presses a few keys, waits a 3 seconds, and the image on the screen is refreshed with NASA quality resolution.

Boom! 

Of course, additional infrastructure would be necessary to deal with various video resolutions but that is no biggie. In fact, we see this idea being utilised in a product all of use on a daily basis: Google Maps. Each time you zoom in, the image is blurry and you need to wait for more pixels to be downloaded. But initially, low resolution images are transferred to your device to save on bandwidth.

So, is that what’s been happening all these years? No way 🙂 Hollywood isn’t that smart. The CIA might be, though. (If not, and they’re reading this: Yes, I will consider being hired by you – get your people to contact my people).

Summary

The old “enhance image” scene from movies may be annoying as hell. But it may not be as far-fetched as things may initially seem. Compressed forms of videos could be sent initially to save on bandwidth. Then, when more resolution is needed, a request can be sent for better quality.

 

To be informed when new content like this is posted, subscribe to the mailing list:

 

Please share what you just read: