Zbigatron | Zbigatron

Gait Recognition is Another Form of Biometric Identification

Posted on December 5, 2017January 16, 2025 by Zbigatron

I was watching The Punisher on Netflix last week and there was a scene (no spoilers, promise) in which someone was recognised from CCTV footage by the way they were walking. “Surely, that’s another example of Hollywood BS“, I thought to myself – “there’s no way that’s even remotely possible”. So, I spent the last week researching into this – and to my surprise it turns out that this is not a load of garbage after all! Gait Recognition is another legitimate form of biometric identification/verification.

In this post I’m going to present to you my past week’s research into gait recognition: what it is, what it typically entails, and what the current state-of-the-art is in this field. Let me just say that what scientists are able to do now in this respect surprised me immensely – I’m sure it’ll surprise you too!

Gait Recognition

In a nutshell, gait recognition aims to identify individuals by the way they walk. It turns out that our walking movements are quite unique, a little like our fingerprints and irises. Who knew, right!? Hence, there has been a lot of research in this field in the past two decades.

There are significant advantages of this form of identity verification. These include the fact that it can be performed from a distance (e.g. using CCTV footage), it is non-invasive (i.e. the person may not even know that he is being analysed), and it does not necessarily require high-resolution images for it to obtain good results.

The Framework for Automatic Gait Recognition

Trawling through the literature on the subject, I found that scientists have used various ways to capture people’s movements for analysis, e.g. using 3D depth sensors or even using pressure sensors on the floor. I want to focus on the use case shown in The Punisher where recognition was performed from a single, stationary security camera. I want to do this simply because CCTV footage is so ubiquitous today and because pure and neat Computer Vision techniques can be used on such footage.

In this context, gait recognition algorithms are typically composed of three steps:

Pre-processing to extract silhouettes
Feature extraction
Classification

Let’s take a look at these steps individually.

1. Silhouette extraction

Silhouette extraction of subjects is generally performed by subtracting the background image from each frame. Once the background is subtracted, you’re left with foreground objects. The pixels associated with these objects can be coloured white and then extracted.

Background subtraction is a heavily studied field and is by no means a solved problem in Computer Vision. OpenCV provides a few interesting implementations of background subtraction. For example, a background can be learned over time (i.e. you don’t have to manually provide it). Some implementations also allow for things like illumination changes (especially useful for outdoor scenes) and some can also deal with shadows. Which technique is used to subtract the background from frames is irrelevant as long as reasonable accuracy is obtained.

2. Feature extraction

Various features can be extracted once we have the silhouettes of our subjects. Typically, a single gait period (a gait cycle) is first detected, which is the sequence of video showing you take one step with each of your feet. This is useful to do because your gait pattern repeats itself, so there’s no need to analyse anything more than one cycle.

Features from this gait cycle are then extracted. In this respect, algorithms can be divided into two groups: model-based and model-free.

Model-based methods of gait recognition take your gait period and attempt to build a model of your movements. These models, for example, can be constructed by representing the person as a stick-figure skeleton with joints or as being composed of cylinders. Then, numerous parameters are calculated to describe the model. For example, the method proposed in this publication from 2001 calculates distance between the head and feet, the head and pelvis, the feet and pelvis, and the step length of a subject to describe a simple model. Another model is depicted in the image below:

Example-of-gait-model — An example of a biped model with 5 different parameters as proposed in this solution from 2012

Model-free methods work on extracted features directly. Here, undoubtedly the most interesting and most widely used feature extracted from silhouettes is that of the Gait Energy Image (GEI). It was first proposed in 2006 in a paper entitled “Individual Recognition Using Gait Energy Image” (IEEE transactions on pattern analysis and machine intelligence 28, no. 2 (2006): 316-322).

Note: the Pattern Analysis and Machine Intelligence (PAMI) journal is one of the best in the world in the field. Publishing there is a feat worthy of praise.

The GEI is used in almost all of the top gait recognition algorithms because it is (perhaps surprisingly) intuitive, not too prone to noise, and simple to grasp and implement. To calculate it, frames from one gait cycle are superimposed on top of each other to give an “average” image of your gait. This calculation is depicted in the image below where the GEI for two people is shown in the last column.

The GEI can be regarded as a unique signature of your gait. And although it was first proposed way back in 2006, it is still widely used in state-of-the-art solutions today.

gait-energy-image — Examples of two calculated GEIs for two different people shown in the far right column. *(image taken from the original publication)*

3. Classification

Once step 2 is complete, identification of subjects can take place. Standard classification techniques can be used here, such as k-nearest neighbour (KNN) and the support vector machine (SVM). These are common techniques that are used when one is dealing with features. They are not constrained to the use case of computer vision. Indeed, any other field that uses features to describe their data will also utilise these techniques to classify/identify their data. Hence, I will not dwell on this step any longer. I will, however, will refer you to a state-of-the-art review of gait recognition from 2010 that lists some more of these common classification techniques.

So, how good is gait recognition then?

We’ve briefly taken a look at how gait recognition algorithms work. Let’s now take a peek at how good they are at recognising people.

We’ll first turn to some recent news. Only 2 months ago (October, 2017) Chinese researchers announced that they have developed the best gait recognition algorithm to date. They claim that their system works with the subject being up to 50 metres away and that detection times have been reduced to just 200 milliseconds. If you read the article, you will notice that no data/results are presented so we can’t really investigate their claims. We have to turn to academia for hard evidence of what we’re seeking.

“Gaitgan: invariant gait feature extraction using generative adversarial networks” (Yu et al., IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 30-37. 2017) is the latest top publication on this topic. I won’t go through their proposed algorithm (it is model-based and uses the GEI), I will just present their results – which are in fact quite impressive.

To test their algorithm, the authors used the CASIA-B dataset. This is one of the largest publicly available datasets for gait recognition. It contains video footage of 124 subjects walking across a room captured at various angles ranging from front on, side view, and top down. Not only this, but walking is repeated by the same people while wearing a coat and then while wearing a backpack, which adds additional elements of difficulty to gait recognition. And the low resolution of the videos (320×240 – a decent resolution in 2005 when the dataset was released) makes them ideal to test gait recognition algorithms on considering how CCTV footage has generally low quality also.

Three example screenshots from the dataset is shown below. The frames are of the same person with a side-on view. The second and third image shows the subject wearing a coat and a bag, respectively.

casia-example — Example screenshots from the CASIA B dataset of the same person walking.

Recognition rates with front-on views with no bag or coat linger around 20%-40% (depending on the height of the camera). Rates then gradually increase as the angle nears the side-on view (that gives a clear silhouette). At the side-on view with no bag or coat, recognition rates reach an astounding 98.75%! Impressive and surprising.

When it comes to analysing the clips with the people carrying a bag and wearing a coat, results are summarised in one small table that shows only a few indicative averages. Here, recognition rates obviously drop but the top rates (obtained with side-on views) persist at around the 60% mark.

What can be deduced from these results is that if the camera distance and angle and other parameters are ideal (e.g. the subject is not wearing/carrying anything concealing), gait recognition works amazingly well for a reasonably sized subset of people. But once ideal conditions start to change, accuracy gradually decreases to (probably) inadequate levels.

And I will also mention (perhaps something you may have already garnered) that these algorithms also only work if the subject is acting normally. That is, the algorithms work if the subject is not changing the way he usually walks, for example by walking faster (maybe as a result of stress) or by consciously trying to forestall gait recognition algorithms (like we saw in The Punisher!).

However, an accuracy rate of 98.75% with side-on views shows great potential for this form of identification and because of this, I am certain that more and more research will be devoted to this field. In this respect, I will keep you posted if I find anything new and interesting on this topic in the future!

Summary

Gait recognition is another form of biometric identification – a little like iris scanning and fingerprints. Interesting computer vision techniques are utilised on single-camera footage to obtain sometimes 99% recognition results. These results depend on such things as camera angles and whether subjects are wearing concealing clothes or not. But much like other recognition techniques (e.g. face recognition), this is undoubtedly a field that will be further researched and improved in the future. Watch this space.

To be informed when new content like this is posted, subscribe to the mailing list (or subscribe to my YouTube channel!):

Lie Detection with Thermal Imaging – A Task for Computer Vision

Posted on November 23, 2017January 16, 2025 by Zbigatron

Can thermal imaging detect if you’re lying or not? It sure can! And are there frightening prospects with respect to this technology? Yes, there are! Read on to find out what scientists have recently done in this area – and all using image processing techniques.

Thermal Imaging

Thermal imaging (aka infrared thermography, thermographic imaging, and infrared imaging) is the science of analysing images captured from thermal (infrared) cameras. The images returned by these cameras capture infrared radiation not visible to the naked eye that are emitted by objects. All objects above absolute zero (-273.15 °C or −459.67°F) emit such radiation. And the general rule is that the hotter an object is, the more infrared radiation it emits.

There has been some amazing work done recently by scientists with respect to thermal imaging and deception detection. These scientists have managed to construct sophisticated lie detectors with their thermal cameras. And what is interesting for us is that these lie detectors work by using many standard Computer Vision techniques. In fact, thermal imaging is a beautiful example of where Computer Vision techniques can be used on images that do not come from “traditional” cameras.

This post is going to analyse how these lie detectors work and underline the Computer Vision techniques that are being used by them. The latter part of the post will extrapolate how thermal imaging might impact us in the future – and these predictions are quite frightening/exciting, to say the least.

Lie Detectors

The idea behind lie detectors is to detect minor physiological changes, such as an increase in blood pressure, pulse or respiration, that can occur when we experience a certain anxiety, shame or nervousness when dropping a fib (aka telling a porky, taking artistic license, being Tony Blair, etc.). These are such slight physiological changes in us that instruments that measure them need to be very precise.

We’ve all seen polygraphs being used in films to detect deception. An expert sits behind a machine and looks at readings from sensors that are connected to the person being interrogated. Although accuracy is said to be at around 90% in detecting lies (according to a few academic papers I studied), the problem is that highly trained experts are required to interpret results – and this interpretation can take hours in post-interview analysis. Moreover, polygraph tests require participants’ cooperation in that they need to be physically connected to these sensors. They’re what’s called ‘invasive’ procedures.

lie-detecting-Keeler — A polygraph test being conducted in 1935 *(Image source: Wikipedia)*

Thermal imagery attempts to alleviate these problems. The idea behind them is to detect changes in the surface temperature of the skin caused by the effects of lying. Since all one needs is a thermal camera to observe the participant, they’re non-invasive procedures. And because of this, the person being interrogated can be oblivious to the fact that he’s being scrutinised for lying. Moreover, the process can be automated with image/video processing algorithms – no experts required for analysis!

Computer Vision in Thermal Imaging

Note: Although I gloss over a lot of technical details in this section, I still assume here that you have a little bit of computer vision knowledge. If you’re here for the discussion on how thermal imagery could be used in the future, skip to the next section.

There are some interesting journal papers on deception detection using thermal imagery and computer vision algorithms. For example, “Thermal Facial Analysis for Deception Detection” (Rajoub, Bashar A., and Reyer Zwiggelaar. IEEE transactions on information forensics and security 9, no. 6 (2014): 1015-1023) reports results of 87% on 492 responses (249 lies and 243 truths). Techniques such as machine learning were used to build a model of deceptive/non-deceptive responses. The models were then utilised to classify responses.

But I want to look at a paper published internally by the Faculty of Engineering at the Pedagogical and Technological University of Colombia in South America. It’s not a very “sophisticated” publication (e.g. axes are not labelled, statistical significance of results is not presented, the face detection algorithm is a little dubious, etc.) but the computer vision techniques used are much more interesting to analyse.

The paper in question is entitled “Detection of lies by facial thermal imagery analysis” published this year (2017) by Bedoya-Echeverry et al. The authors used a fairly low-resolution thermal camera (320×240 pixels) to obtain comparable results to polygraph tests: 75% success rate in detecting lies and 100% success rate in detecting truths.

I am going to work through a simplified version of the algorithm presented by the authors. The full version involves a few extra calculations but what I present here is the general framework of what was done. I’ll give you enough to show you that implementing a thermal-based lie detector is a trivial task (once you can afford to purchase the $3,000 thermal camera).

This simplified algorithm can be divided into two stages:

Face detection and segmentation
Periorbital area detection and tracking

Let’s work through these two stages one-by-one and see what a fully-automated lie detector based on Computer Vision techniques can look like.

1. Face detection and segmentation

For face detection the authors first used Otsu’s method on their black and white thermal images. Otsu’s method takes a greylevel (intensity) image and reduces it to a binary image, i.e. an image containing only two colours: white and black. A pixel is coloured either white or black depending on whether it falls below or above a certain dynamically calculated threshold. The threshold is chosen such that it minimises the intra-class variance of the intensity values. See this page for a clear explanation on how this is done exactly.

When the binary image has been produced, the face is detected by calculating the largest connected region in the image.

face-detection-from-IR-image — *(Image adapted from original publication)*

Note: functions for Otsu’s method and finding the largest connected components are all available in OpenCV and are easy to use.

2. Periorbital area detection and tracking

Once the face has been detected and segmented out, the next step is to locate the periorbital region, which is the horizontal, rectangular region containing your eyes and top of your nose. This region has a high concentration of veins and arteries and is therefore ideal for scrutinising for micro-temperature changes.

The periorbital region can be found by dividing the face (detected in step 1) into 4 equally-spaced horizontal strips and then selecting the second region from the top. To save having to perform steps one and two for each frame, the KLT algorithm is used to track the periorbital area between frames. See this OpenCV tutorial page for a decent explanation of how this tracking algorithm works. It’s a little maths intensive – sorry! But you can at least see that it’s also easy to implement.

Temperature readings are then made from the detected region and an average calculated per frame. When the average temperature (i.e. pixel intensity) peaks during the answering of a question, the algorithm can deduce that the person is lying through their teeth!

That’s not that complicated, right? Even though I simplified the algorithm a little (a few additional calculations are performed to assist tracking in step 2), the gist of it is there! Here you have a fully-automated, non-invasive lie detector that uses Computer Vision techniques to get results comparable to polygraph tests.

What the Future Holds in Lie Detection and Thermal Imagery

Now, let’s have a think about what a non-invasive lie detector could potentially achieve in the future.

Lie detectors are all about noticing micro-changes in blood flow, right? Can you imagine such a system tracking you around the store to gauge which products get you a little bit excited? Results can be instantly forwarded to a seller who now has the upper hand over you. Nobody will be safe from second-hand car dealers any more. To confirm anything, all he has to do is ask if you like a certain car or not.

What about business meetings? You can put in a cheeky thermal camera in the corner of a meeting room and get live reports on how clients are really feeling about your pitch. Haggling will be a lot easier to deal with in this way if you know what your opponent is truly thinking.

And what about poker? You will be able to beat (unethically?) that one friend who always cleans up at your “friendly” weekend poker nights.

The potential is endless, really. And who knows?! Maybe we’ll have thermal cameras in our phones one day, too? Computer vision will definitely be a powerful tool in the future 🙂

What other uses of deception detection using thermography can you think of?

Summary

Traditionally, lie detection has been performed using a polygraph test. This test, however, is invasive and needs an expert to painstakingly analyse results from the various sensors that are used. Digital thermography is looking like a viable alternative. Scientists have shown that using standard computer vision techniques, deception detection can be non-invasive, automated, and get results comparable to polygraph tests. Non-invasive lie detectors are a scary prospect considering that they could track our every move and analyse all our emotions in real time.

To be informed when new content like this is posted, subscribe to the mailing list (or subscribe to my YouTube channel!):

How Can I Start a Career in Computer Vision?

Posted on November 17, 2017January 16, 2025 by Zbigatron

This post has been inspired by a question someone asked in a forum. This person was a new but competent programmer who was trying to move into Computer Vision (CV) in the industry. However, he rightly noticed that “most of the job requirements [in computer vision] are asking for a PhD”.

Indeed, this is true. And because of this, finding a job in computer vision in the industry is very difficult. Mainly because computer vision (e.g. video analytics) hasn’t caught on yet as a meaningful source of data or income for companies. I think it will sooner or later (more on this in a future post) but the reality is that such jobs are rare and hence why companies can afford to advertise and be picky for people with a PhD.

So, if you’re in a situation where you can’t land a job in CV and especially if you’re without a PhD, don’t give up. Computer vision is a fascinating field to work in and it will only get bigger with time. It’s worth fighting on.

Here are three things you can do to improve your chances of landing that job that you really want.

Read Books, Tutorials, Publications, and Blogs

This is obvious but needs to be said. Keep reading up on the field, keep fine-tuning your skills and knowledge in CV. You need to show your potential employer that you know the field of CV exceptionally well. Read important books that are or have been published. Some you can find in your library, some come in PDF format. For example, Neural Networks and Deep Learning is a great book on the hot topic of neural networks that is available online free of charge.

Work through the tutorials available on the OpenCV page. There’s plenty there on machine learning, photo processing, object detection, etc. to keep you busy for months! The idea is to get so good at CV to be able to instantly see a solution to an image/video processing problem. You need to shine at those job interviews.

Follow blogs on Computer Vision. Two of my favourite are PyImageSearch and Learn OpenCV. These guys are regularly posting stuff that will fascinate anybody with a passion for Computer Vision. In fact, PyImageSearch is so well-written that it put me off from starting this blog for a while.

Consider also looking into academic publications. These can be daunting, especially if you don’t have a background in research. But focus initially on the seminal papers (more on this in a future post) and try to get the gist of what the scientists are saying. You can usually pick up small bits and pieces here and there and implement simplified versions of them.

Side Projects

One thing that did get me a lot of attention were my side projects. Side projects show people where your passions lie. And passion is something that a lot of companies are looking for. Believe me, if you came to my company and I was asked to interview you to join the Computer Vision team, your side projects would be one of the first things I’d be looking at.

So, get stuck into a few of these to show that you love the area and you do this kind of stuff for fun. Get a Raspberry PI going with a camera and build your own security system via motion detection, for example. Or get a drone for your Raspberry Pi and camera and be creative with it. Then list these side projects at the end of your CV. If you’re truly passionate about Computer Vision, you will get noticed sooner or later.

Branch out into other areas of Artificial Intelligence

What I decided to do when my job hunting wasn’t going too well was to aim for jobs in other areas of AI rather than just Computer Vision. It involved me having to pick up additional knowledge in fields I wasn’t too familiar with (e.g. Robotic Process Automation) but the amount of jobs in these areas is much larger. This tactic proved successful for me. I ended up picking a company (yes, I was spoilt for choice in the end!) that had interesting clients and it was only a matter of time before opportunities for computer vision projects came along that we were all pushing for.

An Inspirational Story

If you are feeling down about your job searching or if you’re wondering whether CV is a viable place to aim for in the job market, here is a truly inspirational story out of India (from PyImageSearch – I told you it was a well-written blog!). It’s about a fellow who really wanted to work in CV but was at a disadvantage because he came from a low-income family. But he didn’t give up and put the hard work in. Today he is working on AI solutions for drones for a company in India. Moreover, with his salary he can support his family, has paid off all his debts, and is working in a field he absolutely loves!

Summary

Finding a job in Computer Vision is difficult. Most companies are advertising for people with a PhD. But there are things you can do to boost your chances of landing that job you really want. For example, you can keep your CV skills sharp by continually reading up on the subject. You can also work on side projects in CV and you can try branching out into other areas of AI to broaden the scope of projects you are qualified for. Don’t give up on your quest because CV is a field that is going to grow – it’s only a matter of time before the industry catches on to the amazing things CV can do with their video data.

To be informed when new content like this is posted, subscribe to the mailing list (or subscribe to my YouTube channel!):

Samsung’s and Apple’s Face Recognition Technologies and How To Fool Them

Posted on November 12, 2017January 16, 2025 by Zbigatron

In September 2017 Apple announced iPhone X with a very neat feature called Face ID. This feature is used to recognise your face to allow you to unlock your phone. Samsung, however, has had facial recognition since the release of Android Ice Cream Sandwich way back in 2011. What is the difference between the two technologies? And how can either of them be fooled? Read on to find out.

Samsung’s Face Recognition

Samsung’s Face Unlock feature works by using the regular front camera of your phone to take a picture of your face. It analyses this picture for facial features such as the distance between the eyes, facial contours, iris colour, iris size, etc. This information is stored on your phone so that next time you try to unlock it, the phone takes a picture of you, processes it for the aforementioned data and then compares it to the information it has stored on your phone. If everything matches, your phone is unlocked.

The only problem is that all processing is done using 2D images. So, as you may have guessed, a simple printed photo of your face or even one displayed on another phone will fool the system. Need proof? Here’s a video of someone unlocking a Galaxy Note 8, which was released in April 2017, with a photo shown on another phone. It’s quite amusing.

There was a “liveness check” added to Face Unlock with the release of Android Jelly Bean in 2012. This works by attempting to detect blinking. I haven’t tried this feature but from what I’ve read on forums, it isn’t very accurate and requires a longer time to process your face – hence probably why the feature isn’t turned on by default. And yes, it could also be fooled by a close-up video of you, though this would be much harder to acquire.

Note: Samsung is aware of the security flaws of Face Unlock, which is why it does not allow identity verification for Samsung Pay to be made using it. Instead it advocates for the use of its iris recognition technology. But is that technology free from flaws? No chance, as a security researcher from Berlin has shown. He took a photo of his friend’s eye from a few metres away (!) in infrared mode (i.e. night mode), printed it out on paper, and then stuck a contact lens on the printed eye. Clever.

Apple’s Face ID

This is where the fun begins. Apple really took this feature seriously. In a nutshell, Face ID works by firstly illuminating your face with IR light (IR = infrared light that is not visible to the naked eye) and then projecting a further 30,000 (!) IR points onto your face to build a super-detailed 3D map of your facial features. Quite impressive.

This technology, however, has been in use for a very long time. If you’re familiar with the Kinect camera/sensor (initially released in 2010), it uses the same concept of infrared point projection to capture and analyse 3D motion.

So, how do you fool the ‘TrueDepth camera system’, as Apple calls it? It’s not easy because this technology is quite sophisticated. But successful attempts have already been documented in 2017.

To start off with, here’s a video showing identical twins unlocking each other’s phones. Also quite amusing. How about relatives that look similar? It’s been done! Here’s a video showing a 10-year-old boy unlocking his mother’s phone. Now that’s a little more worrisome. However, it shows that iPhone Xs can be an alternative to DNA paternity/maternity tests 🙂 Finally, in November 2017, Vietnamese hackers posted a video documenting how their 3D-printed face mask fooled Apple’s technology. Some elements, like the eyes, on this mask were printed on a standard colour printer. The model of the face was acquired in 5 minutes using a hand-held scanner.

Summary

Apple in September 2017 released a new facial recognition feature with their iPhone X called ‘FaceID’. It works by projecting IR light onto your face to build a detailed 3D map of it. It is hard to fool but successful attempts have been documented in 2017. Samsung’s facial recognition system called Face Unlock has been around since 2011. It, however, only analyses 2D images and hence can be duped easily with printed photos or another phone showing the phone owner’s face.

To be informed when new content like this is posted, subscribe to the mailing list (or subscribe to my YouTube channel!):

Extracting Images from Reflections in the Eye

Posted on November 10, 2017January 16, 2025 by Zbigatron

Ever thought about whether you could zoom in on someone’s eye in a photo and analyse the reflection on it? Read on to find out what research has done in this respect!

A previous post of mine discussed the idea of enhancing regions in an image for better clarity much like we often see in Hollywood films. While researching for that post I stumbled upon an absolutely amazing academic publication from 2013.

The publication in question is entitled “Identifiable Images of Bystanders Extracted from Corneal Reflections” (R. Jenkins & C. Kerr, PloS one 8, no. 12, 2013). In the experiments that Jenkins & Kerr performed, passport-style photographs were taken of volunteers while a group of bystanders stood behind the camera watching. The volunteers’ eyes were then zoomed in on and the faces of the onlookers reflected in the eyes were extracted, as shown in the figure below:

Zooming-face — *(Image adapted from the original publication)*

Freaky stuff, right!? Despite the fact that these reflections comprised only 0.5% of the initial image size, you can quite clearly make out what is reflected in the eye. The experiments that were performed also showed that the bystanders were not only visible but identifiable. Unfortunately, with a small population size for the experiments, this technically makes the results statistically insignificant (the impact factor of the journal in 2016 was 2.8, which speaks for itself) – but who cares?! The coolness factor of what they did is through the roof! Just take a look at a row of faces that they managed to extract from a few reflections captured by the cameras. Remember, these are reflections located on eyeballs:

reflections-eye-row-faces — *(Image taken from the original publication)*

With respect to interesting uses for this research the authors state the following:

our findings suggest a novel application of high-resolution photography: for crimes in which victims are photographed, corneal image analysis could be useful for identifying perpetrators.

Imagine a hostage taking a photo of their victim and then being recognised from the reflection in the victim’s eye!

But it gets better. When discussing future work, they mention that 3D reconstruction of the reflected scene could be possible if stereo images are combined from reflections from both eyes. This is technically possible (we’re venturing into work I did for my PhD) but you would need much higher resolution and detailed data of the outer shape of a person’s eye because, believe it or not, we each have a differently shaped eyeball.

Is there a catch? Yes, unfortunately so. I’ve purposely left this part to the very end because most people don’t read this far down a page and I didn’t want to spoil the fun for anyone 🙂 But the catch is this: the Hasselblad H2D camera used in this research produces images at super-high resolution: 5,412 x 7,216 pixels. That’s a whopping 39 megapixels! In comparison, the iPhone X camera takes pictures at 12 megapixels. And the Hasselblad camera is ridiculously expensive at US$25,000 for a single unit. However, as the authors state, the “pixel count per dollar for digital cameras has been doubling approximately every twelve months”, which means that sooner or later, if this trend continues, we will be sporting such 39 megapixel cameras on our standard phones. Nice!

Summary

Jenkins and Kerr showed in 2013 that extracting reflections on eyeballs from photographs is not only possible but faces on these reflections can be identifiable. This can prove useful in the future for police trying to capture kidnappers or child sex abusers who frequently take photos of their victims. The only caveat is that for this to work, images need to be of super-high resolution. But considering how our phone cameras are improving at a regular rate, we may not be too far away from the ubiquitousness of such technology. To conclude, Jenkins and Kerr get the Noble Prize for Awesomeness from me for 2013 – hands down winners.

To be informed when new content like this is posted, subscribe to the mailing list (or subscribe to my YouTube channel!):

How to Find a Good Thesis Topic in Computer Vision

Posted on November 8, 2017January 16, 2025 by Zbigatron

“What are some good thesis topics in Computer Vision?”

This is a common question that people ask in forums – and it’s an important question to ask for two reasons:

There’s nothing worse than starting over in research because the path you decided to take turned out to be a dead end.
There’s also nothing worse than being stuck with a generally good topic but one that doesn’t interest you at all. A “good” thesis topic has to be one that interests you and will keep you involved and stimulated for as long as possible.

For these reasons, it’s best to do as much research as you can to avoid the above pitfalls or your days of research will slowly become torturous for you – and that would be a shame because computer vision can truly be a lot of fun 🙂

So, down to business.

The purpose of this post is to propose ways to find that one perfect topic that will keep you engaged for months (or years) to come – and something you’ll be proud to talk about amongst friends and family.

I’ll start the discussion off by saying that your search strategy for topics depends entirely on whether you’re preparing for a Master’s thesis or a PhD. The former can be more general, the latter is (nearly always) very fine-grained specific. Let’s start with undergraduate topics first.

Undergraduate Studies

I’ll propose here three steps you can take to assist in your search: looking at the applications of computer vision, examining the OpenCV library, and talking to potential supervisors.

Applications of Computer Vision

Computer Vision has so many uses in the world. Why not look through a comprehensive list of them and see if anything on that list draws you in? Here’s one such list I collected from the British Machine Vision Association:

agriculture
augmented reality
autonomous vehicles (big one nowadays!)
biometrics
character recognition
forensics
industrial quality inspection
face recognition
gesture analysis
geoscience
image restoration
medical image analysis
pollution monitoring
process control
remote sensing
robotics (e.g. navigation)
security and surveillance
transport

Go through this list and work out if something stands out for you. Perhaps your family is involved in agriculture? Look up how computer vision is helping in this field! The Economist wrote a fascinating article entitled “The Future of Agriculture” in which they discuss, among other things, the use of drones to monitor crops, create contour maps of fields, etc. Perhaps Computer Vision can assist with some of these tasks? Look into this!

OpenCV

OpenCV is the best library out there for image and video processing (I’ll be writing a lot more about it on this blog). Other libraries do exist that do certain specific things a little better, e.g. Tracking.js, which performs things like tracking inside the browser, but generally speaking, there’s nothing better than OpenCV.

On the topic of searching for thesis topics, I recall once reading a suggestion of going through the functions that OpenCV has to offer and seeing if anything sticks out at you there. A brilliant idea. Work down the list of the OpenCV documentation. Perhaps face recognition interests you? There are so many interesting projects where this can be utilised!

Talk to potential supervisors

You can’t go past this suggestion. Every academic has ideas constantly buzzing around his head. Academics are immersed in their field of research and are always to talking to people in the industry to look for interesting projects that they could get funding for. Go and talk to the academics at your university that are involved in Computer Vision. I’m sure they’ll have at least one project proposal ready to go for you.

You should also run any ideas of yours past them that may have emerged from the two previous steps. Or at least mention things that stood out for you (e.g. agriculture). They may be able to come up with something themselves.

PhD Studies

Well, if you’ve reached this far in your studies then chances are you have a fairly good idea of how this all works now. I won’t patronise you too much, then. But I will mention three points that I wish someone had told me prior to starting my PhD adventure:

You should be building your research topic around a supervisor. They’ve been in the field for a long time and know where the niches and dead ends are. Use their experience! If there’s a supervisor who is constantly publishing in object tracking, then doing research with them in this area makes sense.
If your supervisor has a ready-made topic for you, CONSIDER TAKING IT. I can’t stress this enough. Usually the first year of your PhD involves you searching (often blindly) around various fields in Computer Vision and then just going deeper and deeper into one specific area to find a niche. If your supervisor has a topic on hand for you, this means that you are already one year ahead of the crowd. And that means one year saved of frustration because searching around in a vast realm of publications can be daunting – believe me, I’ve been there.
Avoid going into trending topics. For example, object recognition using Convolutional Neural Networks is a topic that currently everyone is going crazy about in the world of Computer Vision. This means that in your studies, you will be competing for publications with big players (e.g. Google) who have money, manpower, and computer power at their disposal. You don’t want to enter into this war unless you are confident that your supervisor knows what they’re doing and/or your university has the capabilities to play in this big league also.

Summary

Spending time looking for a thesis topic is time worth spending. It could save you from future pitfalls. With respect to undergraduate thesis topics looking at Computer Vision applications is one place to start. The OpenCV library is another. And talking to potential supervisors at your university is also a good idea.

With respect to PhD thesis topics, it’s important to take into consideration what the fields of expertise of your potential supervisors are and then searching for topics in these areas. If these supervisors have ready-made topics for you, it is worth considering them to save you a lot of time and stress in the first year or so of your studies. Finally, it’s usually good to avoid trending topics because of the people you will be competing against for publications.

But the bottom line is, devote time to finding a topic that truly interests you. It’ll be the difference between wanting to get out of bed to do more and more research in your field or dreading each time you have to walk into your Computer Science building in the morning.

To be informed when new content like this is posted, subscribe to the mailing list (or subscribe to my YouTube channel!):

Is image enhancing possible? Yes, in a way…

Posted on November 5, 2017January 16, 2025 by Zbigatron

I was rewatching “Bourne Identity” the other day. Love that flick! Heck, the scene at the end is one of my favourites. Jason Bourne grabs a dead guy, jumps off the top floor landing, and while falling shoots a guy square in the middle of the forehead. He then breaks his fall on the dead body he took down with him. That has to be one of the best scenes of all time in the action genre.

But there’s one scene in the film that always causes me to throw up a little in my mouth. It’s the old “Just enhance it!” scene (minute 31 of the movie) and something we see so often in cinemas: people scanning security footage and zooming in on a face; when the image becomes blurry they request for the blur to dissipate. The IT guy waves his wand and presto!, we see a full resolution image on the screen. No one stands a chance against magic like that.

But why is enhancing images as shown in movies so ridiculous? Because you are requesting the computer to create new information for the extra pixels that you are generating. Let’s say you zoom in on a 4×4 region of pixels and want to perform facial recognition on it. You then request for this region to enhance. This means you are requesting more resolution, say 640×480. How on earth is the computer supposed to infer what the additional 307,184 pixels are to contain?

The other side to the story

However! Something happened at work that made me realise that the common “Enhance” scenario may not be as far-fetched as one would initially think. A client came to us a few weeks ago requesting that we perform some detailed video analytics of their security footage. They had terabytes of the stuff – but, as is so often the case, the sample video provided to us wasn’t of the best quality. So, we wrote back to the client stating the dilemma and requested that they send us better quality footage. We haven’t heard back from them yet, but you know what? It’s well possible that they will provide us with what we need!

You see, they compressed the video footage in order for it to be sent over the Internet quickly. And here is where the weak link surfaces: transferring of data. If they could have sent the full uncompressed video easily, they would have.

Quality vs transmission restraints

So, back to Hollywood. Let’s say your security footage is recording at some mega resolution. NASA has released images from its Hubble Space Telescope at resolutions of up to 18,000 x 18,000. That’s astronomical! (apologies for the pun). At that resolution, each image is a whopping 400MB (rounded up) in size. This, however, means that you can keep zooming in on their images until the cows come home. Try it out. It’s amazing.

But let’s assume the CIA, those bad guys chasing Bourne, have similar means at their disposal (I mean, who knows what those people are capable of, right!?). Now, let’s say their cameras have a frame rate of 30 frames/sec, which is relatively poor for the CIA. That means that for each second of video you need 12GB of storage space. A full day of recording would require you to have 1 petabyte of space. And that’s just footage from one camera!

It’s possible to store video footage of that size – Google cloud storage capacities are through the roof. But, the bottleneck is the transferring of such data. Imagine if half a building was trying to trawl through security footage in its original form from across the other side of the globe.

The possible scenario

See where I’m going with this? Here is a possible scenario: initially, security footage is sent across the network in compressed form. People scan this footage and then when they see something interesting, they zoom in and request the higher resolution form of the zoomed in region. The IT guy presses a few keys, waits 3 seconds, and the image on the screen is refreshed with NASA quality resolution.

Boom!

Of course, additional infrastructure would be necessary to deal with various video resolutions but that is no biggie. In fact, we see this idea being utilised in a product all of us use on a daily basis: Google Maps. Each time you zoom in, the image is blurry and you need to wait for more pixels to be downloaded. But initially, low resolution images are transferred to your device to save on bandwidth.

So, is that what’s been happening all these years in our films? No way. Hollywood isn’t that smart. The CIA might be, though. (If not, and they’re reading this: Yes, I will consider being hired by you – get your people to contact my people).

Summary

The old “enhance image” scene from movies may be annoying as hell. But it may not be as far-fetched as things may initially seem. Compressed forms of videos could be sent initially to save on bandwidth. Then, when more resolution is needed, a request can be sent for better quality.

To be informed when new content like this is posted, subscribe to the mailing list (or subscribe to my YouTube channel!):

Be an Optimist Prime in the world of Computer Vision and AI

Author: Zbigatron

Gait Recognition

The Framework for Automatic Gait Recognition

1. Silhouette extraction

2. Feature extraction

3. Classification

So, how good is gait recognition then?

Summary

Thermal Imaging

Lie Detectors

Computer Vision in Thermal Imaging

1. Face detection and segmentation

2. Periorbital area detection and tracking

What the Future Holds in Lie Detection and Thermal Imagery

Summary

Read Books, Tutorials, Publications, and Blogs

Side Projects

Branch out into other areas of Artificial Intelligence

An Inspirational Story

Summary

Samsung’s Face Recognition

Apple’s Face ID

Summary

Summary

Undergraduate Studies

Applications of Computer Vision

OpenCV

Talk to potential supervisors

PhD Studies

Summary

The other side to the story

Quality vs transmission restraints

The possible scenario

Summary