Computer Vision in the Fashion Industry – Part 1

(image source)

Computer vision has a plethora of applications in the industry: cashier-less stores, autonomous vehicles (including those loitering on Mars), security (e.g. face recognition) – the list goes on endlessly. I’ve already written about the incredible growth of this field in the industry and, in a separate post, the reasons behind it.

In today’s post I would like to discuss computer vision in a field that I haven’t touched upon yet: the fashion industry. In fact, I would like to devote my next few posts to this topic because of how ingeniously computer vision is being utilised in it.

In this post I will introduce the fashion industry and then present something that Microsoft recently did in the field with computer vision.

In my next few posts I would like to present what the academic world (read: cutting-edge research) is doing in this respect. You will see quite amazing things there, so stay tuned for that!

The Fashion Industry

The fashion industry is huge. And that’s probably an understatement. At present it is estimated to be worth US$2.4 trillion. How big is that? If the fashion industry were a country, it would be ranked as the 7th largest economy in the world – above my beloved Australia and other countries like Russia and Spain. Utterly huge.

Moreover, it is reported to be growing at a steady rate of 5.5% each year.

On the e-commerce market, the clothing and fashion sectors dominate. In the EU, for example, the majority of the 530 billion euro e-commerce market is made up of this industry. Moreover, The Economic Times predicts that the online fashion market will grow three-fold in the next few years. The industry appears to be in agreement with this forecast considering some of the major takeovers being currently discussed. The largest one on the table at the moment is of Flipkart, India’s biggest online store that attributes 50% of its transactions to fashion. Walmart is expected to win the bidding war by purchasing 73% of the company that it has valued at US$22 billion. Google is expected to invest a “measly” US$3 billion also. Ridiculously large amounts of money!

So, if the industry is so huge, especially online, then it only makes sense to bring artificial intelligence into play. And since fashion is a visual thing, this is a perfect application for computer vision!

(I’ve always said it: now is a great time to get into computer vision)

Microsoft and the Fashion Industry

3 weeks ago, Microsoft published on their Developer Blog an interesting article detailing how they used deep learning to build an e-commerce catalogue visual search system for “a successful international online fashion retailer” (which one it was has not been disclosed). I would like to present a summary of this article here because I think it is a perfect introduction to what computer vision can do in the fashion industry. (In my next post you will see how what Microsoft did is just a drop in the ocean compared to what researches are currently able to do).

The motivation behind this search system was to save this retailer’s time in finding whether each new arriving item matches a merchandise item already in stock. Currently, employees have to manually look through catalogues and perform search and retrieval tasks themselves. For a large retailer, sifting through a sizable catalogue can be a time consuming and tedious process.

So, the idea was to be able to take a photo from a mobile phone of a piece of clothing or footwear and search for it in a database for matches.

You may know that Google already has image search functionalities. Microsoft realised, however, that for their application in fashion to work, it was necessary to construct their own algorithm that would include some initial pre-processing of images. The reason for this is that the images in the database had a clean background whereas if you take a photo on your phone in a warehouse setting, you will capture a noisy background. The images below (taken from the original blog post) show this well. The first column shows a query image (taken by a mobile phone), the second column the matching image in the database.

Microsoft, hence, worked on a background subtraction algorithm that would remove the background of an image and only leave the foreground (i.e. salient fashion item) behind.

Background subtraction is a well-known technique in the computer vision field and it is by all means still an open area of research. OpenCV in fact has a few very interesting implementations available of background subtraction. See this OpenCV tutorial for more information on these.


Microsoft decided not to use these but instead to try out other methods for this task. It first tried GrabCut, a very popular background segmentation algorithm first introduced in 2004. In fact, this algorithm was developed by Microsoft researchers to which Microsoft still owns the patent rights (hence why you won’t find it in the main repository of OpenCV any more).

I won’t go into too much detail on how GrabCut works but basically, for each image, you first need to manually provide a bounding box of the salient object in the foreground. After that, GrabCut builds a model (i.e. a mathematical description) of the background (area outside of the bounding box) and foreground (area inside the bounding box) and using these models iteratively trims inside the rectangle until it deduces where the foreground object lies. This process can be repeated by then manually indicating where the algorithm went wrong inside the bounding box.

The image below (from the original publication of 2004) illustrates this process. Note that the red rectangle was manually provided as were also the white and red strokes in the bottom left image.

The images below show some examples provided by Microsoft from their application. The first column shows raw images from a mobile phone taken inside a warehouse, the second column shows initial results using GrabCut, and the third column shows images using GrabCut after additional human interaction. These results are pretty good.


But Microsoft wasn’t happy with GrabCut for the important reason of it requiring human interaction. It wanted a solution that would work simply by only providing a photo of a product. So, it decided to move to a deep learning solution: Tiramisu (Yum, I love that cake…)

Tiramisu is a type of DenseNet, which in turn is a specific type of Convolutional Neural Network (CNN). Once again, I’m not going to go into detail on how this network works. For more information see this publication that introduced DenseNets and this paper that introduced Tiramisu. But basically DenseNets connect each layer to every other layer whereas CNN layers have connections with only their nearest layers.

DenseNets work (suprisingly?) well on relatively small datasets. For specific tasks using deep neural networks, you usually need a few thousand example images for each class you are trying to classify for. DenseNets can get remarkable results with around 600 images (which is still a lot but it’s at least a bit more manageable).

So, Microsoft trained a Tiramisu model from scratch with two classes: foreground and background. Only 249 images were provided for each class! The foreground and background training images were segmented using GrabCut with human interaction. The model achieved an accuracy rate of 93.7% at the training stage. The example image below shows an original image, the corresponding labelled training image (white is foreground and black is background), and the predicted Tiramisu result. Pretty good!

How did it fare in the real world? Apparently quite well. Here are some example images. The top row shows the automatically segmented image (i.e. with the background subtracted out) and the bottom row shows the original input images. Very neat 🙂

The segmented images (e.g. top row in the above image) were then used to query a database. How this querying took place and what algorithm was used to detect potential matches is, however, not described in the blog post.

Microsoft has released all their code from this project so feel free to take a look yourselves.


In this post I introduced the topic of computer vision in the fashion industry. I described how the fashion industry is a huge business currently worth approximately US$2.4 trillion and how it is dominating on the online market. Since fashion is a visual trade, this is a perfect application for computer vision.

In the second part of this post I looked at what Microsoft did recently to develop a catalogue visual search system. They performed background subtraction on photos of fashion items using a DenseNet solution and these segmented images were used to query an already-existing catalogue.

Stay tuned for my next post which will look at what academia has been doing with respect to computer vision and the fashion industry.


To be informed when new content like this is posted, subscribe to the mailing list:

Please share what you just read:

Gait Recognition – Another Form of Biometric Identification

I was watching The Punisher on Netflix last week and there was a scene (no spoilers, promise) in which someone was recognised from CCTV footage by the way they were walking. “Surely, that’s another example of Hollywood BS“, I thought to myself – “there’s no way that’s even remotely possible”. So, I spent the last week researching into this – and to my surprise it turns out that this is not a load of garbage after all! Gait Recognition is another legitimate form of biometric identification/verification. 

In this post I’m going to present to you my past week’s research into gait recognition: what it is, what it typically entails, and what the current state-of-the-art is in this field. Let me just say that what scientists are able to do now in this respect surprised me immensely – I’m sure it’ll surprise you too!

Gait Recognition

In a nutshell, gait recognition aims to identify individuals by the way they walk. It turns out that our walking movements are quite unique, a little like our fingerprints and irises. Who knew, right!? Hence, there has been a lot of research in this field in the past two decades.

There are significant advantages of this form of identity verification. These include the fact that it can be performed from a distance (e.g. using CCTV footage), it is non-invasive (i.e. the person may not even know that he is being analysed), and it does not necessarily require high-resolution images for it to obtain good results.

The Framework for Automatic Gait Recognition

Trawling through the literature on the subject, I found that scientists have used various ways to capture people’s movements for analysis, e.g. using 3D depth sensors or even using pressure sensors on the floor. I want to focus on the use case shown in The Punisher where recognition was performed from a single, stationary security camera. I want to do this simply because CCTV footage is so ubiquitous today and because pure and neat Computer Vision techniques can be used on such footage.

In this context, gait recognition algorithms are typically composed of three steps:

  1. Pre-processing to extract silhouettes
  2. Feature extraction
  3. Classification

Let’s take a look at these steps individually.

1. Silhouette extraction

Silhouette extraction of subjects is generally performed by subtracting the background image from each frame. Once the background is subtracted, you’re left with foreground objects. The pixels associated with these objects can be coloured white and then extracted.

Background subtraction is a heavily studied field and is by no means a solved problem in Computer Vision. OpenCV provides a few interesting implementations of background subtraction. For example, a background can be learned over time (i.e. you don’t have to manually provide it). Some implementations also allow for things like illumination changes (especially useful for outdoor scenes) and some can also deal with shadows. Which technique is used to subtract the background from frames is irrelevant as long as reasonable accuracy is obtained.

Example of silhouette extraction

2. Feature extraction

Various features can be extracted once we have the silhouettes of our subjects. Typically, a single gait period (a gait cycle) is first detected, which is the sequence of video showing you take one step with each of your feet. This is useful to do because your gait pattern repeats itself, so there’s no need to analyse anything more than one cycle.

Features from this gait cycle are then extracted. In this respect, algorithms can be divided into two groups: model-based and model-free.

Model-based methods of gait recognition take your gait period and attempt to build a model of your movements. These models, for example, can be constructed by representing the person as a stick-figure skeleton with joints or as being composed of cylinders. Then, numerous parameters are calculated to describe the model. For example, the method proposed in this publication from 2001 calculates distance between the head and feet, the head and pelvis, the feet and pelvis, and the step length of a subject to describe a simple model. Another model is depicted in the image below:

An example of a biped model with 5 different parameters as proposed in this solution from 2012

Model-free methods work on extracted features directly. Here, undoubtedly the most interesting and most widely used feature extracted from silhouettes is that of the Gait Energy Image (GEI). It was first proposed in 2006 in a paper entitled “Individual Recognition Using Gait Energy Image” (IEEE transactions on pattern analysis and machine intelligence 28, no. 2 (2006): 316-322).

Note: the Pattern Analysis and Machine Intelligence (PAMI) journal is one of the best in the world in the field. Publishing there is a feat worthy of praise. 

The GEI is used in almost all of the top gait recognition algorithms because it is (perhaps surprisingly) intuitive, not too prone to noise, and simple to grasp and implement. To calculate it, frames from one gait cycle are superimposed on top of each other to give an “average” image of your gait. This calculation is depicted in the image below where the GEI for two people is shown in the last column.

The GEI can be regarded as a unique signature of your gait. And although it was first proposed way back in 2006, it is still widely used in state-of-the-art solutions today.

Examples of two calculated GEIs for two different people shown in the far right column. (image taken from the original publication)

3. Classification

Once step 2 is complete, identification of subjects can take place. Standard classification techniques can be used here, such as k-nearest neighbour (KNN) and the support vector machine (SVM). These are common techniques that are used when one is dealing with features. They are not constrained to the use case of computer vision. Indeed, any other field that uses features to describe their data will also utilise these techniques to classify/identify their data. Hence, I will not dwell on this step any longer. I will, however, will refer you to a state-of-the-art review of gait recognition from 2010 that lists some more of these common classification techniques.

So, how good is gait recognition then?

We’ve briefly taken a look at how gait recognition algorithms work. Let’s now take a peek at how good they are at recognising people.

We’ll first turn to some recent news. Only 2 months ago (October, 2017) Chinese researchers announced that they have developed the best gait recognition algorithm to date. They claim that their system works with the subject being up to 50 metres away and that detection times have been reduced to just 200 milliseconds. If you read the article, you will notice that no data/results are presented so we can’t really investigate their claims. We have to turn to academia for hard evidence of what we’re seeking.

Gaitgan: invariant gait feature extraction using generative adversarial networks” (Yu et al., IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 30-37. 2017) is the latest top publication on this topic. I won’t go through their proposed algorithm (it is model-based and uses the GEI), I will just present their results – which are in fact quite impressive.

To test their algorithm, the authors used the CASIA-B dataset. This is one of the largest publicly available datasets for gait recognition. It contains video footage of 124 subjects walking across a room captured at various angles ranging from front on, side view, and top down. Not only this, but walking is repeated by the same people while wearing a coat and then while wearing a backpack, which adds additional elements of difficulty to gait recognition. And the low resolution of the videos (320×240 – a decent resolution in 2005 when the dataset was released) makes them ideal to test gait recognition algorithms on considering how CCTV footage has generally low quality also.

Three example screenshots from the dataset is shown below. The frames are of the same person with a side-on view. The second and third image shows the subject wearing a coat and a bag, respectively.

Example screenshots from the CASIA B dataset of the same person walking.

Recognition rates with front-on views with no bag or coat linger around 20%-40% (depending on the height of the camera). Rates then gradually increase as the angle nears the side-on view (that gives a clear silhouette). At the side-on view with no bag or coat, recognition rates reach an astounding 98.75%! Impressive and surprising.

When it comes to analysing the clips with the people carrying a bag and wearing a coat, results are summarised in one small table that shows only a few indicative averages. Here, recognition rates obviously drop but the top rates (obtained with side-on views) persist at around the 60% mark.

What can be deduced from these results is that if the camera distance and angle and other parameters are ideal (e.g. the subject is not wearing/carrying anything concealing), gait recognition works amazingly well for a reasonably sized subset of people. But once ideal conditions start to change, accuracy gradually decreases to (probably) inadequate levels.

And I will also mention (perhaps something you may have already garnered) that these algorithms also only work if the subject is acting normally. That is, the algorithms work if the subject is not changing the way he usually walks, for example by walking faster (maybe as a result of stress) or by consciously trying to forestall gait recognition algorithms (like we saw in The Punisher!).

However, an accuracy rate of 98.75% with side-on views shows great potential for this form of identification and because of this, I am certain that more and more research will be devoted to this field. In this respect, I will keep you posted if I find anything new and interesting on this topic in the future!


Gait recognition is another form of biometric identification – a little like iris scanning and fingerprints. Interesting computer vision techniques are utilised on single-camera footage to obtain sometimes 99% recognition results. These results depend on such things as camera angles and whether subjects are wearing concealing clothes or not. But much like other recognition techniques (e.g. face recognition), this is undoubtedly a field that will be further researched and improved in the future. Watch this space.


To be informed when new content like this is posted, subscribe to the mailing list:


Please share what you just read:

Thermal Imaging and Lie Detection – A Task for Computer Vision

Can thermal imaging detect if you’re lying or not? It sure can! And are there frightening prospects with respect to this technology? Yes, there are! Read on to find out what scientists have recently done in this area – and all using image processing techniques.

Thermal Imaging

Thermal imaging (aka infrared thermography, thermographic imaging, and infrared imaging) is the science of analysing images captured from thermal (infrared) cameras. The images returned by these cameras capture infrared radiation not visible to the naked eye that are emitted by objects. All objects above absolute zero (-273.15 °C or −459.67°F) emit such radiation. And the general rule is that the hotter an object is, the more infrared radiation it emits.

There has been some amazing work done recently by scientists with respect to thermal imaging and deception detection. These scientists have managed to construct sophisticated lie detectors with their thermal cameras. And what is interesting for us is that these lie detectors work by using many standard Computer Vision techniques. In fact, thermal imaging is a beautiful example of where Computer Vision techniques can be used on images that do not come from “traditional” cameras.

This post is going to analyse how these lie detectors work and underline the Computer Vision techniques that are being used by them. The latter part of the post will extrapolate how thermal imaging might impact us in the future – and these predictions are quite frightening/exciting, to say the least.

Lie Detectors

The idea behind lie detectors is to detect minor physiological changes, such as an increase in blood pressure, pulse or respiration, that can occur when we experience a certain anxiety, shame or nervousness when dropping a fib (aka telling a porky, taking artistic license, being Tony Blair, etc.). These are such slight physiological changes in us that instruments that measure them need to be very precise.

We’ve all seen polygraphs being used in films to detect deception. An expert sits behind a machine and looks at readings from sensors that are connected to the person being interrogated. Although accuracy is said to be at around 90% in detecting lies (according to a few academic papers I studied), the problem is that highly trained experts are required to interpret results – and this interpretation can take hours in post-interview analysis. Moreover, polygraph tests require participants’ cooperation in that they need to be physically connected to these sensors. They’re what’s called ‘invasive’ procedures.

A polygraph test being conducted in 1935 (Image source: Wikipedia)

Thermal imagery attempts to alleviate these problems. The idea behind them is to detect changes in the surface temperature of the skin caused by the effects of lying. Since all one needs is a thermal camera to observe the participant, they’re non-invasive procedures. And because of this, the person being interrogated can be oblivious to the fact that he’s being scrutinised for lying. Moreover, the process can be automated with image/video processing algorithms – no experts required for analysis!

Computer Vision in Thermal Imaging

Note: Although I gloss over a lot of technical details in this section, I still assume here that you have a little bit of computer vision knowledge. If you’re here for the discussion on how thermal imagery could be used in the future, skip to the next section. 

There are some interesting journal papers on deception detection using thermal imagery and computer vision algorithms. For example, “Thermal Facial Analysis for Deception Detection(Rajoub, Bashar A., and Reyer Zwiggelaar. IEEE transactions on information forensics and security 9, no. 6 (2014): 1015-1023) reports results of 87% on 492 responses (249 lies and 243 truths). Techniques such as machine learning were used to build a model of deceptive/non-deceptive responses. The models were then utilised to classify responses.

But I want to look at a paper published internally by the Faculty of Engineering at the Pedagogical and Technological University of Colombia in South America. It’s not a very “sophisticated” publication (e.g. axes are not labelled, statistical significance of results is not presented, the face detection algorithm is a little dubious, etc.) but the computer vision techniques used are much more interesting to analyse.

The paper in question is entitled “Detection of lies by facial thermal imagery analysis” published this year (2017) by Bedoya-Echeverry et al. The authors used a fairly low-resolution thermal camera (320×240 pixels) to obtain comparable results to polygraph tests: 75% success rate in detecting lies and 100% success rate in detecting truths.

I am going to work through a simplified version of the algorithm presented by the authors. The full version involves a few extra calculations but what I present here is the general framework of what was done. I’ll give you enough to show you that implementing a thermal-based lie detector is a trivial task (once you can afford to purchase the $3,000 thermal camera).

This simplified algorithm can be divided into two stages:

  1. Face detection and segmentation
  2. Periorbital area detection and tracking

Let’s work through these two stages one-by-one and see what a fully-automated lie detector based on Computer Vision techniques can look like.

1. Face detection and segmentation

For face detection the authors first used Otsu’s method on their black and white thermal images. Otsu’s method takes a greylevel (intensity) image and reduces it to a binary image, i.e. an image containing only two colours: white and black. A pixel is coloured either white or black depending on whether it falls below or above a certain dynamically calculated threshold. The threshold is chosen such that it minimises the intra-class variance of the intensity values. See this page for a clear explanation on how this is done exactly.

When the binary image has been produced, the face is detected by calculating the largest connected region in the image.

(Image adapted from original publication)

Note: functions for Otsu’s method and finding the largest connected components are all available in OpenCV and are easy to use.

2. Periorbital area detection and tracking

Once the face has been detected and segmented out, the next step is to locate the periorbital region, which is the horizontal, rectangular region containing your eyes and top of your nose. This region has a high concentration of veins and arteries and is therefore ideal for scrutinising for micro-temperature changes.

The periorbital region can be found by dividing the face (detected in step 1) into 4 equally-spaced horizontal strips and then selecting the second region from the top. To save having to perform steps one and two for each frame, the KLT algorithm is used to track the periorbital area between frames. See this OpenCV tutorial page for a decent explanation of how this tracking algorithm works. It’s a little maths intensive – sorry! But you can at least see that it’s also easy to implement.

Temperature readings are then made from the detected region and an average calculated per frame. When the average temperature (i.e. pixel intensity) peaks during the answering of a question, the algorithm can deduce that the person is lying through their teeth!

That’s not that complicated, right? Even though I simplified the algorithm a little (a few additional calculations are performed to assist tracking in step 2), the gist of it is there! Here you have a fully-automated, non-invasive lie detector that uses Computer Vision techniques to get results comparable to polygraph tests.

What the Future Holds in Lie Detection and Thermal Imagery

Now, let’s have a think about what a non-invasive lie detector could potentially achieve in the future.

Lie detectors are all about noticing micro-changes in blood flow, right? Can you imagine such a system tracking you around the store to gauge which products get you a little bit excited? Results can be instantly forwarded to a seller who now has the upper hand over you. Nobody will be safe from second-hand car dealers any more. To confirm anything, all he has to do is ask if you like a certain car or not.

What about business meetings? You can put in a cheeky thermal camera in the corner of a meeting room and get live reports on how clients are really feeling about your pitch. Haggling will be a lot easier to deal with in this way if you know what your opponent is truly thinking.

And what about poker? You will be able to beat (unethically?) that one friend who always cleans up at your “friendly” weekend poker nights.

The potential is endless, really. And who knows?! Maybe we’ll have thermal cameras in our phones one day, too? Computer vision will definitely be a powerful tool in the future 🙂

What other uses of deception detection using thermography can you think of?


Traditionally, lie detection has been performed using a polygraph test. This test, however, is invasive and needs an expert to painstakingly analyse results from the various sensors that are used. Digital thermography is looking like a viable alternative. Scientists have shown that using standard computer vision techniques, deception detection can be non-invasive, automated, and get results comparable to polygraph tests. Non-invasive lie detectors are a scary prospect considering that they could track our every move and analyse all our emotions in real time.

To be informed when new content like this is posted, subscribe to the mailing list:

Please share what you just read:

Finding a Good Thesis Topic in Computer Vision

“What are some good thesis topics in Computer Vision?”

This is a common question that people ask in forums – and it’s an important question to ask for two reasons:

  1. There’s nothing worse than starting over in research because the path you decided to take turned out to be a dead end.
  2. There’s also nothing worse than being stuck with a generally good topic but one that doesn’t interest you at all. A “good” thesis topic has to be one that interests you and will keep you involved and stimulated for as long as possible.

For these reasons, it’s best to do as much research as you can to avoid the above pitfalls or your days of research will slowly become torturous for you – and that would be a shame because computer vision can truly be a lot of fun 🙂

So, down to business.

The purpose of this post is to propose ways to find that one perfect topic that will keep you engaged for months (or years) to come – and something you’ll be proud to talk about amongst friends and family.

I’ll start the discussion off by saying that your search strategy for topics depends entirely on whether you’re preparing for a Master’s thesis or a PhD. The former can be more general, the latter is (nearly always) very fine-grained specific. Let’s start with undergraduate topics first.

Undergraduate Studies

I’ll propose here three steps you can take to assist in your search: looking at the applications of computer vision, examining the OpenCV library, and talking to potential supervisors.

Applications of Computer Vision

Computer Vision has so many uses in the world. Why not look through a comprehensive list of them and see if anything on that list draws you in? Here’s one such list I collected from the British Machine Vision Association:

  • agriculture
  • augmented reality
  • autonomous vehicles (big one nowadays!)
  • biometrics
  • character recognition
  • forensics
  • industrial quality inspection
  • face recognition
  • gesture analysis
  • geoscience
  • image restoration
  • medical image analysis
  • pollution monitoring
  • process control
  • remote sensing
  • robotics (e.g. navigation)
  • security and surveillance
  • transport

Go through this list and work out if something stands out for you. Perhaps your family is involved in agriculture? Look up how computer vision is helping in this field! The Economist wrote a fascinating article entitled The Future of Agriculturein which they discuss, among other things, the use of drones to monitor crops, create contour maps of fields, etc. Perhaps Computer Vision can assist with some of these tasks? Look into this!


OpenCV is the best library out there for image and video processing (I’ll be writing a lot more about it on this blog). Other libraries do exist that do certain specific things a little better, e.g. Tracking.js, which performs things like tracking inside the browser, but generally speaking, there’s nothing better than OpenCV.

On the topic of searching for thesis topics, I recall once reading a suggestion of going through the functions that OpenCV has to offer and seeing if anything sticks out at you there. A brilliant idea. Work down the list of the OpenCV documentation. Perhaps face recognition interests you? There are so many interesting projects where this can be utilised!

Talk to potential supervisors

You can’t go past this suggestion. Every academic has ideas constantly buzzing around his head. Academics are immersed in their field of research and are always to talking to people in the industry to look for interesting projects that they could get funding for. Go and talk to the academics at your university that are involved in Computer Vision. I’m sure they’ll have at least one project proposal ready to go for you.

You should also run any ideas of yours past them that may have emerged from the two previous steps. Or at least mention things that stood out for you (e.g. agriculture). They may be able to come up with something themselves.

PhD Studies

Well, if you’ve reached this far in your studies then chances are you have a fairly good idea of how this all works now. I won’t patronise you too much, then. But I will mention three points that I wish someone had told me prior to starting my PhD adventure:

  • You should be building your research topic around a supervisor. They’ve been in the field for a long time and know where the niches and dead ends are. Use their experience! If there’s a supervisor who is constantly publishing in object tracking, then doing research with them in this area makes sense.
  • If your supervisor has a ready-made topic for you, CONSIDER TAKING IT. I can’t stress this enough. Usually the first year of your PhD involves you searching (often blindly) around various fields in Computer Vision and then just going deeper and deeper into one specific area to find a niche. If your supervisor has a topic on hand for you, this means that you are already one year ahead of the crowd. And that means one year saved of frustration because searching around in a vast realm of publications can be daunting – believe me, I’ve been there.
  • Avoid going into trending topics. For example, object recognition using Convolutional Neural Networks is a topic that currently everyone is going crazy about in the world of Computer Vision. This means that in your studies, you will be competing for publications with big players (e.g. Google) who have money, manpower, and computer power at their disposal. You don’t want to enter into this war unless you are confident that your supervisor knows what they’re doing and/or your university has the capabilities to play in this big league also.


Spending time looking for a thesis topic is time worth spending. It could save you from future pitfalls. With respect to undergraduate thesis topics looking at Computer Vision applications is one place to start. The OpenCV library is another. And talking to potential supervisors at your university is also a good idea.

With respect to PhD thesis topics, it’s important to take into consideration what the fields of expertise of your potential supervisors are and then searching for topics in these areas. If these supervisors have ready-made topics for you, it is worth considering them to save you a lot of time and stress in the first year or so of your studies. Finally, it’s usually good to avoid trending topics because of the people you will be competing against for publications.

But the bottom line is, devote time to finding a topic that truly interests you. It’ll be the difference between wanting to get out of bed to do more and more research in your field or dreading each time you have to walk into your Computer Science building in the morning.


To be informed when new content like this is posted, subscribe to the mailing list:

Please share what you just read: