My Top 5 Posts So Far

It’s been nearly 18 months since I started this blog. I did it to share my journey in computer vision with you. I love this field and I’m always stumbling across such fascinating things that I feel as though more people should know about them.

I’ve seen this blog grow in popularity – much, much more than I had anticipated when I first started it. In this little “bonus” post, I thought I’d list my top posts thus far with additional comments about them.

I also thought I’d compile a second list with my personal favourite posts. These have not been as popular but I sure as hell had fun writing them!

Enjoy! And thanks for the support over the last 18 months.

My top 5 posts thus far:

  1. Why Deep Learning Has Not Superseded Traditional Computer Vision – I wrote this post on a Friday evening directly after work with a beer bottle in one hand and people playing pool or foosball around me. I wrote it up in an hour or so and didn’t think much of it, to be honest. The next day I woke up and saw, to my extreme surprise, that it had gone slightly viral! It was featured in Deep Learning Weekly (Issue #76), was being reposted by people such as Dr Adrian Rosebrock from PyImageSearch, and was getting about 1000 hits/day. Not bad, hey!?
  2. The Top Image Datasets and Their Challenges
  3. Finding a Good Thesis Topic in Computer Vision – I wrote this post after constantly seeing people asking this question on forums. Considering it’s consistently in my top 3 posts every week, I guess people are still searching for inspiration.
  4. Mapping Camera Coordinates to a 2D Floor Plan – This post came about after I had to work on security footage from a bank for a project at work. The boss was very pleased with what I had done and writing about my experiences in a post was a no-brainer after that.
  5. The Early History of Computer Vision – History is something that really interests me so it was only a matter of time before I was going to read up on the history of computer vision. Once I did and saw how fascinating it was, I just had to write a post about it.

My favourite posts thus far. Like I said, these are not popular (some barely get a single hit in a week) but I really enjoyed researching for and writing them:

  1. Soccer on Your Tabletop – The coolest thing going around in computer vision.
  2. Amazon Go – Computer Vision at the Forefront of Innovation – This to me is something amazing.
  3. The Baidu and ImageNet Controversy – Nothing like a good controversy!
  4. Computer Vision on Mars – Computer vision in space. Imagine working on that project!
  5. The Growth of Computer Vision in the IndustryThe Reasons Behind the Recent Growth of Computer Vision – I’m proud of how far computer vision has come over the years. It’s been a pleasure to be a part of the adventure.

Enjoy looking back over my posts. Thanks once again for your support over the last 18 months.


To be informed when new content like this is posted, subscribe to the mailing list:

Please share what you just read:

The Early History of Computer Vision

As I’ve discussed in previous posts of mine, computer vision is growing faster than ever. In 2016, investments into US-based computer vision companies more than tripled since 2014 – from $100 million to $300 million. And it looks like this upward trend is going to continue worldwide, especially if you consider all the major acquisitions that were made in 2017 in the field, the biggest one being Intel’s purchase in March of Mobileye (a company specialising in computer vision-based collision prevention systems for autonomous cars) for a whopping $15 billion.

But today I want to take a look back rather than forward. I want to devote some time to present the early historical milestones that have led us to where we are now in computer vision.

This post, therefore, will focus on seminal developments in computer vision between the 60s and early 80s.

Larry Roberts – The Father of Computer Vision

Let’s start first with Lawrence Roberts. Ever heard of him? He calls himself the founder of the Internet. The case for giving him this title is strong considering that he was instrumental in the design and development of the ARPANET, which was the technical foundation of what you are surfing on now.

What is not well known is that he is also dubbed the father of computer vision in our community. In 1963 he published “Machine Perception Of Three-Dimensional Solids”, which started it all for us. There he discusses extracting 3D information about solid objects from 2D photographs of line drawings. He mentions things such as camera transformations, perspective effects, and “the rules and assumptions of depth perception” – things that we discuss to this very day.

Just take a look at this diagram from his original publication (which can be found here – and a more readable form can be found here):


Open up any book on image processing and you will see a similar diagram in there discussing the relationship between a camera and an object’s projection on a 2D plane.

Funnily enough, Lawrence’s Wikipedia page does not give a single utterance to his work in computer vision, which is all the more surprising considering that the publication I mentioned above was his PhD thesis! Crazy, isn’t it? If I find the time, I’ll go over and edit that article to give him the additional credit that he deserves.

The Summer Vision Project

Lawrence Roberts’ thesis was about analysing line drawings rather than images taken of the real world. Work in line drawings was to continue for a long time, especially after the following important incident of 1966.

You’ve probably all heard the stories from the 50s and 60s of scientists predicting a bright future within a generation for artificial intelligence. AI became an academic discipline and millions was pumped into research with the intention of developing a machine as intelligent as a human being within 25 years. But it didn’t take long before people realised just how hard creating a “thinking” machine was going to be.

Well, computer vision has its own place in this ambitious time of AI as well. People then thought that constructing a machine to mimic the human visual system was going to be an easy task on the road to finally building a robot with human-like intelligent behaviour.

In 1966, Seymour Papert organised “The Summer Vision Project” at the MIT. He assigned this project to Gerald Sussman who was to co-ordinate a small group of students to work on background/foreground segmentation of real-world images with a final goal of extracting non-overlapping objects from them.

Only in the past decade or so have we been able to obtain good results in a task such as this. So, those poor students really did now know what they were in for. (Note: I write about why image processing is such a hard task in another post of mine). I couldn’t find any information on exactly how much they were able to achieve over that summer but this will definitely be something I will try to find out if I ever get to visit the MIT in the United States.

Luckily enough, we also have access to the original memo of this project available to us – which is quite neat. It’s definitely a piece of history for us computer vision scientists. Take a look at the abstract (summary) of the project from the memo from 1966 (the full version can be found here):


Continued Work with Line Drawings

In the 1970s work continued with line drawings because real-world images were just too hard to handle at the time. To this extent, people were regularly looking into extracting 3D information about blocks from 2D images.

Line labelling was an interesting concept being looked at in this respect. The idea was to try to discern a shape in a line drawing by first attempting to annotate all the lines it was composed of accordingly. Line labels would include convex, concave, and occluded (boundary) lines. An example of a result from a line labelling algorithm can be seen below:

Line labelling example with convex (+), concave (-), and occluded (<–) labels. (Image taken from here)

Two important people in the field of line labelling were David Huffman (“Impossible objects as nonsense sentences”. Machine Intelligence, 8:475-492, 1971) and Max Clowes (“On seeing things”. Artificial Intelligence, 2:79-116, 1971) who both published their line labelling algorithms independently in 1971.

In the genre of line labelling, interesting problems such as the one below were also looked at:


The image above was taken from a seminal book written by David Marr at the MIT entitled “Vision: A computational investigation into the human representation and processing of visual information”. It was finished around 1979 but posthumously published in 1982. In this book Marr proposes an important framework to image understanding that is used to this very day: the bottom-up approach. The bottom-up approach, as Marr suggests, uses low-level image processing algorithms as stepping-stones towards attaining high-level information.

(Now, for clarification, when we say “low-level” image processing, we mean tasks such as edge detection, corner detection, and motion detection (i.e. optical flow) that don’t directly give us any high-level information such as scene understanding and object detection/recognition.)

From that moment on, “low-level” image processing was given a prominent place in computer vision. It’s important to also note that Marr’s bottom-up framework is central to today’s deep learning systems (more on this in a future post).

Computer Vision Gathers Speed

So, with the bottom-up model approach to image understanding, important advances in low-level image processing began to be made. For example, the famous Lukas-Kanade optical flow algorithm, first published in 1981 (original paper available here), was developed. It is still so prominent today that it is a standard optical flow algorithm included in the OpenCV library. Likewise, the Canny edge detector, first published in 1986, is again widely used today and is also available in the OpenCV library.

The bottom line is, computer vision started to really gather speed in the late 80s. Mathematics and statistics began playing a more and more significant role and the increase in speed and memory capacity of machines helped things immensely also. Many more seminal algorithms followed this upward trend, including some famous face detection algorithms. But I won’t go into this here because I would like to talk about breakthrough CV algorithms in a future post.


In this post I looked at the early history of computer vision. I mentioned Lawrence Roberts’ PhD on things such as camera transformations, perspective effects, and “the rules and assumptions of depth perception”. He got the ball rolling for us all and is commonly regarded as the father of computer vision. The Summer Vision Project of 1966 was also an important event that taught us that computer vision, along with AI in general, is not an easy task at all. People, therefore, focused on line drawings until the 80s when Marr published his idea for a bottom-up framework for image understanding. Low-level image processing took off spurred on by advancements in the speed and memory capacity of machines and a stronger mathematical and statistical vigour. The late 80s and onwards saw tremendous developments in CV algorithm but I will talk more about this in a future post.


To be informed when new content like this is posted, subscribe to the mailing list:

Please share what you just read:

Recent Controversies in Computer Vision

Computer vision is a fascinating area in which to work and perform research. And, as I’ve mentioned a few times already, it’s been a pleasure to witness its phenomenal growth, especially in the last few years. However, as with pretty much anything in the world, contention also plays a part in its existence.

In this post I would like to present 2 very recent events from the world of computer vision that have recently caused controversy:

  1. A judge’s ruling that Facebook must stand trial for its facial recognition software
  2. Uber’s autonomous car death of a pedestrian

Facebook and Facial Recognition

This is an event that has seemingly passed under the radar – at least for me it did. Probably because of the Facebook-Cambridge Analytica scandal that has been recently flooding the news and social discussions. But I think this is also an important event to mull over because it touches upon underlying issues associated with an important topic: facial recognition and privacy.

So, what has happened?

In 2015, Facebook was hit with a class action lawsuit (the original can be found here) by three residents from Chicago, Illinois. They are accusing Facebook of violating the state’s biometric privacy laws by the firm collecting and storing biometric data of each user’s face. This data is being stored without written notification. Moreover, it is not clear exactly what the data is to be used for, nor how long it will reside in storage, nor was there an opt-out option ever provided.

Facebook began to collect this data, as the lawsuit states, in a “purported attempt to make the process of tagging friends easier”.


In other words, what Facebook is doing (yes, even now) is summarising the geometry of your face with certain parameters (e.g. distance between eyes, shape of chin, etc.). This data is then used to try to locate your face elsewhere to provide tag suggestions. But for this to be possible, the biometric data needs to be stored somewhere for it to be recalled when needed.

The Illinois residents are not happy that a firm is doing this without their knowledge or consent. Considering the Cambridge Analytica scandal, they kind of have a point, you would think? Who knows where this data could end up. They are suing for $75,000 and have requested a jury trial.

Anyway, Facebook protested over this lawsuit and asked that it be thrown out of court stating that the law in question does not cover its tagging suggestion feature. A year ago, a District Judge rejected Facebook’s appeal.

Facebook appealed again stating that proof of actual injury needs to be shown. Wow! As if violating privacy isn’t injurious enough!?

But on the 14th May, the same judge discarded (official ruling here) Facebook’s appeal:

[It’s up to a jury] to resolve the genuine factual disputes surrounding facial scanning and the recognition technology.

So, it looks like Facebook will be facing the jury on July 9th this year! Huge news, in my opinion. Even if any verdict will only pertain to the United States. There is still so much that needs to be done to protect our data but at least things seem to be finally moving in the right direction.

Uber’s Autonomous Car Death of Pedestrian

You probably heard on the news that on March 18th this year a woman was hit by an autonomous car owned by Uber in Arizona as she was crossing the road. She died in hospital shortly after the collision. This is believed to be the first ever fatality of a pedestrian in which an autonomous car was involved. There have been other deaths in the past (3 in total) but all of them have been of the driver.

(image taken from the US NTSB report)

3 weeks ago the US National Transportation Safety Board (NTSB) released its first report (short read) into this crash. It was only a preliminary report but it provides enough information to state that the self-driving system was at least partially at fault.

(image taken from the US NTSB report)

The report gives the timeline of events: the pedestrian was detected about 6 seconds before impact but the system had trouble identifying it. It was first classified as an unknown object, then a vehicle, then a bicycle – but even then it couldn’t work out the object’s direction of travel. At 1.3 seconds before impact, the system realised that it needed to engage an emergency braking maneuver but this maneuver had been earlier disabled to prevent erratic vehicle behaviour on the roads. Moreover, the system was not designed to alert the driver in such situations. The driver began braking less than 1 second before impact but it was tragically too late.

Bottom line is, if the self-driving system had immediately recognised the object as a pedestrian walking directly into its path, it would have known that avoidance measures would have needed to be taken – well before the emergency braking maneuver was called to be engaged. This is a deficiency of the artificial intelligence implemented in the car’s system. 

No statement has been made with respect to who is legally at fault. I’m no expert but it seems like Uber will be given the all-clear: the pedestrian had hard drugs detected in her blood and was crossing in a non-crossing designated area of the road.

Nonetheless, this is a significant event for AI and computer vision (that plays a pivotal role in self-driving cars) because if these had performed better, the crash would have been avoided (as researchers have shown).

Big ethical questions are being taken seriously. For example, who will be held accountable if a fatal crash is deemed to be the fault of the autonomous car? The car manufacturer? The people behind the algorithms? One sole programmer who messed up a for-loop? Stanford scholars have been openly discussing the ethics behind autonomous cars for a long time (it’s an interesting read, if you have the time).

And what will be the future for autonomous cars in the aftermath of this event? Will their inevitable delivery into everyday use be pushed back?

Testing of autonomous cars has been halted by Uber in North America. Toyota has followed suit. And Chris Jones who leads the Autonomous Vehicle Analysis service at the technology analyst company Canalys, says that these events will set the industry back considerably:

It has put the industry back. It’s one step forward, two steps back when something like this happens… and it seriously undermines trust in the technology.

Furthermore, a former US Secretary of Transportation has deemed the crash a “wake up call to the entire [autonomous vehicle] industry and government to put a high priority on safety.”

But other news reports seem to indicate a different story.

Volvo, the make of car that Uber was driving in the fatal car crash, stated only last week that they expect a third of their cars sold to be autonomous by 2025. Other car manufacturers are making similar announcements. Two weeks ago General Motors and Fiat Chrysler unveiled self-driving deals with people like Google to push for a lead in the self-driving car market.

And Baidu (China’s Google, so to speak) is heavily invested in the game, too. Even Chris Jones is admitting that for them this is a race:

The Chinese companies involved in this are treating it as a race. And that’s worrying. Because a company like Baidu – the Google of China – has a very aggressive plan and will try to do things as fast as it can.

And when you have a race among large corporations, there isn’t much that is going to even slightly postpone anything. That’s been my experience in the industry anyway.


In this post I looked at 2 very recent events from the world of computer vision that have recently caused controversy.

The first was a judge’s ruling in the United States that Facebook must stand trial for its facial recognition software. Facebook is being accused of violating the Illinois’ biometric privacy laws by collecting and storing biometric data of each user’s face. This data is being stored without written notification. Moreover, it is not clear exactly what the data is being used for, nor how long it is going to reside in storage, nor was there an opt-out option ever provided.

The second event was the first recorded death of a pedestrian by an autonomous car in March of this year. A preliminary report was released by the US National Transportation Safety Board 3 weeks ago that states that AI is at least partially at fault for the crash. Debate over the ethical issues inherent to autonomous cars has heated up as a result but it seems as though the incident has not held up the race to bring self-driving cars onto our streets.


To be informed when new content like this is posted, subscribe to the mailing list:

Please share what you just read:

Why Deep Learning Has Not Superseded Traditional Computer Vision

This is another post that’s been inspired by a question that has been regularly popping up in forums:

Has deep learning superseded traditional computer vision?

Or in a similar vein:

Is there still a need to study traditional computer vision techniques when deep learning seems to be so effective? 

These are good questions. Deep learning (DL) has certainly revolutionised computer vision (CV) and artificial intelligence in general. So many problems that once seemed improbable to be solved are solved to a point where machines are obtaining better results than humans. Image classification is probably the prime example of this. Indeed, deep learning is responsible for placing CV on the map in the industry, as I’ve discussed in previous posts of mine.

But deep learning is still only a tool of computer vision. And it certainly is not the panacea for all problems. So, in this post I would like to elaborate on this. That is, I would like to lay down my arguments for why traditional computer vision techniques are still very much useful and therefore should be learnt and taught.

I will break the post up into the following sections/arguments:

  • Deep learning needs big data
  • Deep learning is sometimes overkill
  • Traditional CV will help you with deep learning

But before I jump into these arguments, I think it’s necessary to first explain in detail what I mean by “traditional computer vision”, what deep learning is, and also why it has been so revolutionary.

Background Knowledge

Before the emergence of deep learning if you had a task such as image classification, you would perform a step called feature extraction. Features are small “interesting”, descriptive or informative patches in images. You would look for these by employing a combination of what I am calling in this post traditional computer vision techniques, which include things like edge detection, corner detection, object detection, and the like.

In using these techniques – for example, with respect to feature extraction and image classification – the idea is to extract as many features from images of one class of object (e.g. chairs, horses, etc.) and treat these features as a sort of “definition” (known as a bag-of-words) of the object. You would then search for these “definitions” in other images. If a significant number of features from one bag-of-words are located in another image, the image is classified as containing that specific object (i.e. chair, horse, etc.).

The difficulty with this approach of feature extraction in image classification is that you have to choose which features to look for in each given image. This becomes cumbersome and pretty much impossible when the number of classes you are trying to classify for starts to grow past, say, 10 or 20. Do you look for corners? edges? texture information? Different classes of objects are better described with different types of features. If you choose to use many features, you have to deal with a plethora of parameters, all of which have to be fine-tuned by you.

Well, deep learning introduced the concept of end-to-end learning where (in a nutshell) the machine is told to learn what to look for with respect to each specific class of object. It works out the most descriptive and salient features for each object. In other words, neural networks are told to discover the underlying patterns in classes of images.

So, with end-to-end learning you no longer have to manually decide which traditional computer vision techniques to use to describe your features. The machine works this all out for you. Wired magazine puts it this way:

If you want to teach a [deep] neural network to recognize a cat, for instance, you don’t tell it to look for whiskers, ears, fur, and eyes. You simply show it thousands and thousands of photos of cats, and eventually it works things out. If it keeps misclassifying foxes as cats, you don’t rewrite the code. You just keep coaching it.

The image below portrays this difference between feature extraction (using traditional CV) and end-to-end learning:


So, that’s the background. Let’s jump into the arguments as to why traditional computer vision is still necessary and beneficial to learn.

Deep Learning Needs Big Data

First of all, deep learning needs data. Lots and lots of data. Those famous image classification models mentioned above are trained on huge datasets. The top three of these datasets used for training are:

Easier tasks than general image classification will not require this much data but you will still need a lot of it. What happens if you can’t get that much data? You’ll have to train on what you have (yes, some techniques exist to boost your training data but these are artificial methods).

But chances are a poorly trained model will perform badly outside of your training data because a machine doesn’t have insight into a problem – it can’t generalise for a task without seeing data.

And it’s too difficult for you to look inside the trained model and tweak things around manually because a deep learning model has millions of parameters inside of it – each of which is tuned during training. In a way, a deep learning model is a black box.

Traditional computer vision gives you full transparency and allows you to better gauge and judge whether your solution will work outside of a training environment. You have insight into a problem that you can transfer into your algorithm. And if anything fails, you can much more easily work out what needs to be tweaked and where.

Deep Learning is Sometimes Overkill

This is probably my favourite reason for supporting the study of traditional computer vision techniques.

Training a deep neural network takes a very long time. You need dedicated hardware (high-powered GPUs, for example) to train the latest state-of-the-art image classification models in under a day. Want to train it on your standard laptop? Go on a holiday for a week and chances are the training won’t even be done when you return.

Moreover, what happens if your trained model isn’t performing well? You have to go back and redo the whole thing again with different training parameters. And this process can be repeated sometimes hundreds of times.

But there are times when all this is totally unnecessary. Because sometimes traditional CV techniques can solve a problem much more efficiently and in fewer lines of code than DL. For example, I once worked on a project to detect if each tin passing through on a conveyor belt had a red spoon in it. Now, you can train a deep neural network to detect spoons and go through the time-consuming process outlined above, or you can write a simple colour thresholding algorithm on the colour red (any pixel within a certain range of red is coloured white, every other pixel is coloured black) and then count how many white pixels you have. Simple. You’re done in an hour!

Knowing traditional computer vision can potentially save you a lot of time and unnecessary headaches.

Traditional Computer Vision will Improve your Deep Learning Skills

Understanding traditional computer vision can actually help you be better at deep learning.

For example, the most common neural network used in computer vision is the Convolutional Neural Network. But what is a convolution? It’s in fact a widely used image processing technique (e.g. see Sobel edge detection). Knowing this can help you understand what your neural network is doing under the hood and hence design and fine-tune it better to the task you’re trying to solve.

Then there is also a thing called pre-processing. This is something frequently done on the data that you’re feeding into your model to prepare it for training. These pre-processing steps are predominantly performed with traditional computer vision techniques. For example, if you don’t have enough training data, you can do a task called data augmentation. Data augmentation can involve performing random rotations, shifts, shears, etc. on the images in your training set to create “new” images. By performing these computer vision operations you can greatly increase the amount of training data that you have.


In this post I explained why deep learning has not superseded traditional computer vision techniques and hence why the latter should still be studied and taught. Firstly, I looked at the problem of DL frequently requiring lots of data to perform well. Sometimes this is not a possibility and traditional computer vision can be considered as an alternative in these situations. Secondly, occasionally deep learning can be overkill for a specific task. In such tasks, standard computer vision can solve a problem much more efficiently and in fewer lines of code than DL. Thirdly, knowing traditional computer vision can actually make you better at deep learning. This is because you can better understand what is happening under the hood of DL and you can perform certain pre-processing steps that will improve DL results.

In a nutshell,  deep learning is just a tool of computer vision that is certainly not a panacea. Don’t only use it because it’s trendy now. Traditional computer vision techniques are still very much useful and knowing them can save you time and many headaches.


To be informed when new content like this is posted, subscribe to the mailing list:

Please share what you just read:

The Reasons Behind the Recent Growth of Computer Vision

In my previous post I looked at the unprecedented growth of computer vision in the industry. 10 years ago computer vision was nowhere to be seen outside of academia. But things have since changed significantly. A telling sign of this is the consistent tripling each year of venture capital funding in computer vision. And Intel’s acquisition of Mobileye in March 2017 for a whopping US$15.3 billion just sums up the field’s achievements.

In that post, however, I only briefly touched upon the reasons behind this incredible growth. The purpose of this article, then, is to fill that gap.

In this respect, I will discuss the top 4 reasons behind the growth of computer vision in the industry. I’ll do so in the following order:

  1. Advancements in hardware
  2. The emergence of deep learning
  3. The advent of large datasets
  4. The increase in computer vision applications

Better and More Dedicated Hardware

I mentioned in another post of mine that one of the main reasons why image processing is such a difficult problem is that it deals with an immense amount of data. To process this data you need memory and processing power. These have been increasing in size and power regularly for over 50 years (cf. Moore’s Law).

Such increases have allowed for algorithms to run faster to the point where more and more things are now capable of being run in real-time (e.g. face recognition).

We have also seen an emergence and proliferation of dedicated pieces of hardware for graphics and image processing calculations. GPUs are the prime example of this. A GPU clock speed may generally be slower than a regular CPU’s, but it can still outperform one for these specific tasks. 

Dedicated pieces of hardware are becoming so highly prized nowadays in computer vision that numerous companies have started designing and producing them. Just two weeks ago, for example, Ambarella announced two new chips designed for computer vision processing with chiefly autonomous cars, drones, and security cameras in mind. And last year, Mythic, a startup based in California, raised over US$10 million to commercialise their own deep learning focused pieces of hardware.

The Emergence of Deep Learning

Deep learning, a subfield of machine learning, has been revolutionary in computer vision. Because of it machines are now getting better results than humans in important tasks such as image classification (i.e. detecting what object is in an image).

Previously, if you had a task such as image classification, you would perform a step called feature extraction. Features are small “interesting”, descriptive or informative patches in images. The idea is to extract as many of these from images of one class of object (e.g. chairs, horses, etc.) and treat these features as a sort of “definition” (known as a bag-of-words) of the object. You would then search for these “definitions” in other images. If a significant number of features from one bag-of-words are located in another image, the image is classified as containing that specific object (i.e. chair, horse, etc.).

The difficulty with this approach is that you have to choose which features to look for in each given image. This becomes cumbersome and pretty much impossible when the number of classes you are trying to classify for starts to grow past, say, 10 or 20. Do you look for corners? edges? texture information? Different classes of objects are better described with different types of features. If you choose to use many features, you have to deal with a plethora of parameters, all of which have to be fine-tuned.

Well, deep learning introduced the concept of end-to-end learning where (in a nutshell) the machine is told to learn what to look for with respect to each specific class of object. It works out the most descriptive and salient features for each object. In other words, neural networks are told to discover the underlying patterns in classes of images.

The image below portrays this difference between feature extraction and end-to-end learning:


Deep learning has proven to be extremely successful for computer vision. If you look below at the graph I used in my previous post showing capital investments into US-based computer vision companies since 2011, you can see that when deep learning became mainstream in around 2014/2015, investments suddenly doubled and have been growing at a regular rate since.

(image source)

You can safely say that deep learning put computer vision on the map in the industry. Without it, chances are we would all still be stuck with it in academia (not that there’s anything wrong with academia, of course).

Large Datasets

To allow a machine to learn the underlying patterns of classes of objects it needs A LOT of data. That is, it needs large datasets. More and more of these have been emerging and have been instrumental in the success of deep learning and therefore computer vision.

Before around 2012, a dataset was considered relatively large if it contained 100+ images or videos. Now, datasets exist with numbers ranging in the millions.

Here are some of the most known image classification databases currently being used to test and train the latest state-of-the-art object classification/recognition models. They have all been meticulously hand annotated by the open source community.

  • ImageNet – 15 million images, 22,000 object categories. It’s HUGE! (I hope to write more about this dataset in the near future, so stay tuned for that).
  • Open Images – 9 million images, 5,000 object categories.
  • Microsoft Common Objects in Context (COCO) – 330K images, 80 object categories.
  • PASCAL VOC Dataset – a few versions exist, 20 object categories.
  • CALTECH-101 – 9,000 images with 101 object categories.

I need to also mention Kaggle, the University of California, Irvine Machine Learning Repository. Kaggle hosts 351 image datasets ranging from flowers, wines, forest fires, etc. It is also the home of the famous Facial Expression Recognition Challenge (FER). The aim of this competition is to correctly detect the emotion of people from seven different categories from nearly 35,000 images of faces.

All these datasets and many more have raised computer vision to its current position in the industry. Certainly, deep learning would not be where it is now without them.

More Applications

Faster machines, larger memories, and other advances in technology have increased the number of useful things machines have been able to do for us in our lives. We now have autonomous cars (well, we’re close to having them), drones, factory robots, cleaning robots – the list goes on. With an increase in such vehicles, devices, tools, appliances, etc. has come an increase in the need for computer vision.

Let’s take a look at some examples of recent new ways computer vision is being used today.

Walmart, for example, a few months ago released shelf-scanning robots into 50 of its warehouses. The purpose of these robots is to detect out-of-stock and missing items and other things such as incorrect labelling. Here’s a picture of one of these robots at work:

(image source)

A British online supermarket is using computer vision to determine the best ways to grasp goods for its packaging robots. This video shows their robot in action:

Agriculture as well is capitalising on the growth of computer vision. iUNU, for example, is developing a network of cameras on rails to assist greenhouse owners to keep track of how their plants are growing.

The famous Roomba autonomous vacuum cleaner got an upgrade a few years ago with a new computer vision system to more smartly manoeuvre around your home.

And our phones? Plenty of computer vision being used in them! Ever noticed your phone camera tracking and focusing on your face when you’re trying to take a picture? That’s computer vision. And how about the face recognition services to unlock your phones? Well, Samsung’s version of it can be classified as computer vision (I write about it in this post).

There’s no need to mention autonomous cars here. We are constantly hearing about them on the news. It’s only a matter of time before we’ll be jumping into one.

Computer vision is definitely here to stay. In fact, it’s only going to get bigger with time.


In this post I looked at the four main reasons behind the recent growth of computer vision: 1) the advancements in hardware such as faster CPUs and availability of GPUs; 2) the emergence of deep learning, which has changed our way of performing tasks such as image classification; 3) the advent of large datasets that have allowed us to more meticulously study the underlying patterns in images; and 4) the increase in computer vision applications.

All these factors (not always mutually exclusive) have contributed to the unprecedented position of computer vision in the industry.

As I mentioned in my previous post, it’s been an absolute pleasure to have witnessed this growth and to have seen these factors in action. I truly look forward to what the future holds for computer vision.


To be informed when new content like this is posted, subscribe to the mailing list:

Please share what you just read:

The Growth of Computer Vision in the Industry

I started out in computer vision in 2004. I was walking along the corridors of the computer science department at the University of Adelaide (in South Australia) looking at notices put up by lecturers advertising potential undergraduate thesis topics. There wasn’t much there for me until one particular topic caught my eye: developing a vision system for a soccer-playing robot.

Well, the nerd in my awoke! I knocked on the lecturer’s door and 5 minutes later I walked out with a thesis topic and one of those cheeky smiles that said “we’re in for a lot of fun here”. Little did I know that my topic choice was to be the beginning of my adventures in computer vision that would lead me to a PhD and working for companies in Europe and Australia in this field.

So, I’ve been around the world of computer vision for nearly 15 years. When I started out, computer vision was predominantly a research-based field that rarely ventured outside those university corridors and lecture theatres that I used to saunter around. You just couldn’t do anything practical with it – mainly because machines were too slow and memory sizes were too small.

(Note: see this earlier post of mine that discusses why image processing is such a computation and memory demanding activity)

But things have changed since those days. Computer vision has grown immensely and most importantly it’s shaping up to be a viable source of income in the industry. It’s truly been a pleasure to witness this transformation. And it seems as though things are only going to get better.

In this post, then, I would like to present to you how much computer vision in the industry has grown over the last few years and whether this growth will continue in the future. I would also like to briefly talk about what this means for us computer vision enthusiasts in terms of jobs and opportunities in the workforce (where I am based now).

(Note: in my next post I write in more detail about the reasons behind this growth)

Computer vision in the last few years

Recently, growth in computer vision in the industry has surged. To understand just how much, all one has to do is analyse the speed at which top tech corporations are moving into the field now.

Apple, for example, in this respect made at least two significant takeovers last year, both for an undisclosed amount: one of an Israeli-based startup in February 2017 called Realface that works on facial recognition technology for the authentication of users (could it be behind FaceID?); and another in September of 2017 when it acquired Regaind, a startup from Paris that focuses on AI-driven photo and facial analysis.

Facebook has joined the game also. Two months ago (Nov, 2017) it bought out a German computer vision startup called Fayteq. Fayteq develops plugins for various applications that allow you to add or remove objects from existing videos. Its acquisition follows the purchase of Source3, a company that develops video piracy detection algorithms, which also took place in 2017.

While on the topic of social media, let’s take a look at the recent moves made by Twitter and Snapchat. In 2016 Twitter bought Magic Pony Technology for $150 million. Magic Pony Technology employs machine learning to improve low-quality videos on-the-fly by detecting patterns and textures over time. Whereas Snapchat, also in 2016, acquired Seene, which allows you to, among other things, take 3D shots of objects (e.g. 3D selfies) and insert them into videos. Take a look at what Seene can do in this neat little demo video. Apologies for the digression but it’s just too good not to share here:

Amazon has noted the growth of computer vision in the industry so much so that it recently (Oct 2017) created an AI research hub in Germany that focuses on computer vision. This follows shortly upon its acquisition of a 3D body model startup for around $70 million.

The clear stand-out takeover, however, was made by Intel. Last year in March it bought out Mobileye for a WHOPPING $15.3 billion. Mobileye, an Israeli-based company, explores vision-based technologies for autonomous cars. In fact, Intel and Mobileye unveiled their first autonomous car just three days ago!

Other notable recent acquisitions were made by Baidu (info here) and Ebay (info here).

Such corporate activity is unprecedented for computer vision.

Let’s try to visualise this growth by looking at the following graph (from 2016), showing investments into solely US-based computer vision companies since 2011:


A clear upward trend can be seen beginning with 2011 when investments were barely above zero. In 2004, when I joined the computer vision club, it would have been even less than that. Amazing isn’t it? Like I said, back then the field very rarely ventured outside of academia. Just compare that with the money being pumped into it now.

Here are some more numbers, this time from venture capital funding:

  • In 2015, global venture capital funding in computer vision reached US$186 million.
  • In 2016, that jumped three-fold to $555 million (source).
  • Last year, according to, investments jumped three-fold again to reach a super cool US$1.7 billion. 

The stand-out from venture capital funding from 2017 was the raising of $460 million from multiple investors for Megvii, a Chinese start-up that develops facial recognition technology (it is behind Face++, a product I hope to write about soon).

Serious, serious money, we’re talking about here.

What the Future Holds for Businesses and Us

Undoubtedly, further growth can be easily predicted for computer vision in the industry. Autonomous cars, commercialisation of drones, emotion detection, face recognition, security and surveillance – these are all areas that will be driving the demand for computer vision solutions for businesses.

Tractica predicts that the market for these solutions will grow to $48.6 billion by 2022. Autonomous cars and robotics will be the major players in these future markets:

CV-16 chart
(image taken from here)

What does this mean for us computer vision enthusiasts? Will it be any easier to find those elusive jobs?

In my opinion the state of affairs at our level will not change for a while. A high-level of technical knowledge and understanding backed up by a PhD degree is still going to be the norm for some time to come. Like I mentioned in a previous post, you will need to branch out into other areas of AI to have a decent chance of working on computer vision projects.

Having said that, the situation will slowly start to change once businesses and governments come to realise just some of the things that can be done with the data being acquired by their cameras (more on this in a future post). It’s only a matter of time before this happens, in my opinion, so it’s worth sticking with CV and getting ahead of the crowd now. Investing your time and effort into CV will certainly pay off dividends in the future.


In this post I presented how much computer vision has grown over the last few years. I looked at some of the recent acquisitions into CV made by big companies such as Apple, Intel and Facebook. I then reviewed the current investments being made into CV and showed that this area is experiencing unprecedented growth. Before 2010, computer vision rarely ventured outside of academia. Now, it is starting to be a viable source of income for businesses around the world. Having said that, the situation for us computer vision enthusiasts will not change for a while. CV jobs will still be elusive. More businesses and governments need to realise that the data being acquired by their cameras can also be mined for information before the effects of this unprecedented growth in CV start to significantly affect us. Thankfully, in my opinion, it’s only a matter of time until this happens so it’s worth sticking with CV and getting ahead of the crowd.


To be informed when new content like this is posted, subscribe to the mailing list:

Please share what you just read:

Deep Learning for Computer Vision with Python Review

In this post I will be reviewing a book called Deep Learning for Computer Vision with Python (DL4CV) that was recently published by Dr Adrian Rosebrock, author of “Practical Python and OpenCV” and most notably the computer vision blog PyImageSearch.

I have already (highly) talked about Dr Rosebrock before on my blog in my post on starting a career in computer vision and I mentioned the fact that PyImageSearch is one of my favourite blogs on the internet. So, it is with great pleasure that I sit down here to write this review.

(Fun fact: I’m writing this review in the wild wilderness of Tasmania. What a beautiful place this is! But, alas, we digress…)

In the first part of this post I will focus on presenting a summary of the book(s) and then in the second part I will give you my thoughts on the work itself. But for those that want the TL;DR version of the review, I’ll give that to you now:

The book is phenomenal. The concepts on deep learning are so well explained that I will be recommending it to anybody not just involved in computer vision but AI in general. If you’re thinking of getting into deep learning for computer vision or wish to fine-tune what you already know, forget about the rest – this is the place to start and finish.

Summary of DL4CV

Due to the huge amount of content that it covers, DL4CV is divided into three volumes: Starter Bundle, Practitioner Bundle, and ImageNet Bundle. Each volume builds on top of the previous one and goes further into the world of deep learning for computer vision. The reason why the volumes are called bundles is that they are each accompanied by additional components such as a downloadable pre-configured Ubuntu virtual machine, source code listings, and access to a companion website. Video tutorials and walkthroughs for each chapter are also advertised to be coming soon.

The Starter Bundle is all about the basics of machine learning, neural networks, convolutional neural networks, and working with datasets. And it truly is a starter bundle because half the book is spent laying down a solid foundation for beginners to deep learning. No knowledge is presupposed (although some experience in computer science or even computer vision I would consider to be advantageous here). Deep learning is also presented on the fundamental level with topics covered such as convolutional neural networks (CNNs), their famous implementations, and the Keras framework. Throughout the book, interesting real-world problems (e.g. breaking captchas) are solved with source code provided and explained at each and every step of the way.

The next volume is the Practitioner Bundle that immerses the reader even further in the world of deep learning. More advanced topics and algorithms are covered such as data augmentation, optimisation methods, and the HDF5 data format. Famous implementations of CNNs are also revisited but in a more in-depth manner. This volume was written for those that want to take computer vision and deep learning (whether it be in academia or the industry) seriously. Once again, practical examples with source code are provided every step of the way.

The final volume is the ImageNet Bundle. The first part of the volume is focused on the ImageNet dataset and the training on it of state-of-the-art CNNs. The second part focuses on even more real-world applications of deep learning and computer vision. Transfer learning and other training techniques are discussed in great detail to the point where readers will be able to reproduce the results seen in seminal deep learning papers and publications. This volume was written for those who want to reach a research level of deep learning in computer vision.

My Thoughts

As I said in the TL;DR section above, this book is phenomenal. And I don’t say things like that lightly – and likewise my words aren’t hot air, either. Prior to returning to the industry this year, my main source of employment was education. I taught and lectured in high schools, primary schools, universities, and privately for eight years. During that time, I acquired a good eye for textbooks that truly give the most to their students in each and every class. A lot of good books exist like this in the fields that I taught in (mathematics, English, philosophy, and computer science). But then sometimes you stumble upon the amazing textbooks – the ones that are just so well-written and structured that they make your job of explaining and helping to assimilate things incredibly easy.

And this is one such book. 

Let me tell you, I know a good educator when I see one – Dr Adrian Rosebrock is one such person. This guy has talent. If I were working at a school or university, I’d hire him without even conducting an interview (well, maybe a quick one over the phone just to make sure he’s not a talented nutcase :P)

I really believe that his talent has produced something unique in the field of deep learning, especially because of the following two characteristics:

  • He understands that when it comes to learning you need to get your hands dirty and do something practical with any newly-acquired theory. That’s the best way to assimilate knowledge. In this respect, all his chapters follow this principle and provide hands-on examples with code to help cement the concepts raised and discussed.
  • His explanations are so ridiculously lucid that he is able to make state-of-the-art academic publications reachable to non-academic people. This is rare. Believe me.

I now work in artificial intelligence in the industry and I am being pushed into a training position in my company. When it’ll come to teaching deep learning, this is the book I will be telling my fellow employees to work through and read with me. And there’s a strong chance that his book will be made into the go-to textbook at universities because of how good it is.

But for that to happen, there is one thing that will need to be touched up. And this thing is my sole criticism of the book.

This criticism is that, in my opinion, there are too many typos (spelling mistakes, missing words, etc.) and grammatical mistakes scattered throughout the book. I understand that such things happen to every writer, but I think that there is an overabundance of them here. I’d say that on average there is one such mistake every few pages. At that rate it can get a little bit frustrating and distracting when you’re trying to focus on the content. When it comes to grammatical mistakes, I’m talking about things like mixing up words such as “affect” and “effect” and “awhile” and “a while”. These creases will need to be ironed out if the book is to be put on shelves in a prominent place in universities and colleges.

However, Dr Rosebrock has provided an easy means to submit mistakes like this to him via the companion website. So, let’s hope that the open-source community will help him out in this respect.

Having mentioned this criticism, I must again underline one thing: this is a unique book and no matter the number of typos and grammatical errors (especially if they will be undoubtedly fixed over time), I hope DL4CV will one day become a classic of deep learning and computer vision. In fact, I’m sure it will.


In this post I reviewed the book “Deep Learning for Computer Vision with Python” written by Dr Adrian Rosebrock of the PyImageSearch blog. I gave a brief summary of the three volumes and then presented my thoughts on the work as a whole. I mentioned that I think Dr Rosebrock is a talented educator who has written a very good book that explains very difficult concepts exceptionally well. His focus on both theory and implementation is unique and shows that he (perhaps intuitively) understands best-practices in pedagogy. I will be recommending DL4CV to anybody not just involved in computer vision but AI in general. And I hope DL4CV will become a classic textbook at universities.

To purchase “Deep Learning for Computer Vision with Python” or to get more information on it, see the book’s official page.


To be informed when new content like this is posted, subscribe to the mailing list:

Please share what you just read:

Why is image processing so hard?

This is another post that has been inspired by a question posed in a forum: “What are the open research areas in image processing?”.

My answer? Everything is still an open research area in image processing/computer vision!

But why is this the case? You’d think that after decades of research we’d feel comfortable in saying “this problem here is solved, let’s focus on something else”. In a way we can say this but only for narrow and simple use cases (e.g. locating a red spoon on an empty white plate) but not for computer vision in general (e.g. locating a red spoon in all possible scenarios, like a big box full of colourful toys). I’m going to spend the rest of this post explaining the main reasons behind this.

So, why is computer vision so hard?

Before we dig into what I consider to be the dominant reasons why computer vision is so damn hard, I first need to explain how machines “see” images. When us humans view an image, we perceive objects, people or a landscape. When machines “view” images, all they see are numbers that represent individual pixels.

An example will explain this best. Let’s say that you have a greyscale image. Each pixel, then, is represented by a number usually between 0 and 255 (I’m abstracting here over things like compression, colour spaces, etc.), where 0 is for black (no colour) and 255 is for white (full intensity). Anything between 0 and 255 is a shade of grey, like in the picture below.

Machines “see” pixels as numbers (pixel boundaries added for clarity).

So, for a machine to garner anything about an image, it has to process these numbers in one way or another. This is exactly what image/video processing and computer vision is all about – dealing with numbers!

Now that we have the necessary background information about computer vision we can move on to the meat of the post: the main reasons behind why computer vision is an immensely hard problem to solve. I’m going to list four such reasons:

  1. Swathes of data
  2. Inherent loss of information
  3. Dealing with noise
  4. Requirements for interpretation

We’ll look at these one at a time.

1. We’re dealing with a heck of a lot of data

As I said above, when it comes to images, all computers see are numbers… lots of numbers! And lots of numbers means a lot of data that needs to be processed to be made sense of.

How much data are we talking about here? Let’s take a look at another example (that once again abstracts over many things such as compression and colour spaces). If you have a greyscale (black & white) image with 1920 x 1080 resolution, this means that your image is described by 2 million numbers (1920 * 1080 = 2,073,600 pixels). Now, if you switch to a colour image, you need three times as many numbers because, typically, when you represent a coloured pixel you specify how much read, blue, and green it is composed of. And then further, if you’re trying to analyse images coming in from a video/camera stream with, say, a 30 frames/sec frame rate (which is a standard frame rate nowadays), you’re suddenly dealing with 180 million numbers per second (3 *2,073,600 * 30 ~= 180 million pixels/sec). That is a lot of data that needs processing! Even with today’s powerful processors and relatively large memory sizes, machines struggle to do anything meaningful with 180 million numbers coming in per second.

2. Loss of Information

Loss of information in the digitising process (going from real life to an image on a machine) is another major player contributing to the difficulty involved in computer vision. The nature of image processing is such that you’re taking information from a 3D world (or 4D if we’re dealing with time in a video stream) and projecting it onto a 2D plane (i.e. a flat image). This means that you’re also losing a lot of information in this process – even though we still have a lot of data to deal with as is, as discussed above.

Now, our brains are fantastic at inferring what that lost data is. Machines are not. Take a look at the image below showing a messy room (not mine, promise!)


We can easily tell that the large green gym ball is bigger and further away than the black pan on the table. But how is a machine supposed to infer this if the black pan takes up more pixels than the green ball!? Not an easy task.

Of course, you can attempt to simulate the way we see with two eyes by taking two pictures simultaneously and extracting 3D information from these. This is called stereoscopic vision. However, stitching images together is also not a trivial task and is, hence, likewise an open area of research. Further, it too suffers from the other 3 major reasons I discuss in this post.

3. Noise

The digitising process is frequently accompanied by noise. For example, no camera is going to give you a perfect picture of reality, especially when it comes to the cameras located on our phones (even though phone cameras are getting phenomenally good with each new release). Intensity levels, colour saturation, etc. – these will always be just an attempt at capturing our beautiful world.

Other examples of noise are phenomena known as artefacts. These are distortions of images that can be caused by a number of things. E.g. Lens flare – an example of which is shown in the image below. How is a computer supposed to interpret this and work out what is situated behind it? Algorithms have been developed to attempt to remove lens flare from images but, once again, it’s an open area of research.


The biggest source of artefacts undoubtedly comes from compression. Now, compression is necessary as I discussed in this post. Images would otherwise be too large to store, process, and transfer over networks. But if compression levels are too high, image quality decreases. And then you have compression artefacts appearing, as depicted in the image below.

The right image has clear compression artefacts visible

Humans can deal with artefacts, even if they dominate a scene, as seen above. But this is not the case for computers. Artefacts don’t exist in reality and are frequently arbitrary. They truly add another level of difficulty that machines have to cope with.

4. Interpretation is needed

Lastly and most importantly is interpretation. This is definitely the hardest thing for a machine to deal with in the context of computer vision (and not only!). When we view an image we analyse it with years and years of accumulated learning and memory (called a priori knowledge). We know, for example, that we can sit on gym balls and that pans are generally used in the kitchen – we have learnt about these things in the past. So, if there’s something that looks like a pan in the sky, chances are it isn’t and we can scrutinise further to work out what the object may be (e.g. a frisbee!). Or if there are people kicking around a green ball, chances are it’s not a gym ball but a small children’s ball.

But machines don’t have this kind of knowledge. They don’t understand our world, the intricacies inherent in it, and the numerous tools, commodities, devices, etc. that we have created over the thousands of years of our existence. Maybe one day machines will be able to ingest Wikipedia and extract contextual information about objects from there but at the moment we are very far from such a scenario. And some will argue that we will never reach a phase where machines will be able to completely understand our reality – because consciousness is something that will always be out of reach for them. But more on that in a future post.


I hope I have shown you, at least in a nutshell, why computer vision is such a difficult problem. It is an open area of research and will be for a very, very long time. Ever heard of the Turing test? It’s a test for intelligence devised by the famous computer scientist, Alan Turing in the 1950s. He basically said, that if you’re not able to distinguish between a machine and a human within a specified amount of time by having a natural conversation with both parties, then the machine can be dubbed intelligent.

Well, there is an annual competition called the Loebner Prize that gives away prize money to computer programs deemed most intelligent. The format of the competition is exactly the scenario proposed by Alan Turing: in each round, human judges simultaneously hold textual conversations with a computer program and a human being via a computer. Points are then awarded to how much the machine manages to fool the judges. The top prize awarded each year is about US$3,000. If a machine is able to entirely fool a judge, the prize is $25,000. Nobody has won this award, yet. 

However, there is a prize worth $100,000 that nobody has picked up either. It will be awarded to the first program that judges cannot distinguish from a real human in a Turing test that includes deciphering and understanding text, visual, and auditory input. Once this is achieved, the organisers say that the annual competition will end. See how far away we are from strong intelligence? Nobody has won the $25,000 prize yet, let alone the big one.

I also mentioned above that some simple use cases can be considered solved. I must also mention here that even when use cases appear to be solved, chances are that the speed of the algorithms leave much to be desired. Neural networks are now supposedly performing better than humans in image classification tasks (I hope to write about this in a future post, also). But the state-of-the-art algorithms are barely able to squeeze out ~1 frame/sec on a standard machine. No chance of getting that to work in real-time (remember how I said above that standard frame rates are now at about 30 frames/sec?). These algorithms need to be optimised. So, although the results obtained are excellent, speed is a major issue.


In this post I discuss why computer vision is so hard and why it is still very much an open area of research. I discussed four major reasons for this:

  1. Images are represented by a heck of a lot of data that machines need to process before extracting information from them;
  2. When dealing with images we are dealing with a 2D reality that has been shrunk from 3D meaning that A LOT of information has been lost;
  3. Devices that present the world to us frequently also deliver noise such as compression artefacts and lens flare;
  4. And the most important hurdle for machines is interpretation: the inability to fully comprehend the world around us and its intricacies that we learn to deal with from the very beginnings of our lives.

I then mentioned the Loebner Prize, which is an AI competition inspired by the Turing test. Nobody has yet won the $25,000, let alone the big one that involves analysing images. I also discussed the need to optimise the current state-of-the-art algorithms in computer vision. A lot of them do a good job but the amount of processing that takes place behind the scenes makes them unusable in real-time scenarios.

Computer vision is definitely still an open area of research.

To be informed when new content like this is posted, subscribe to the mailing list:

Please share what you just read:

How can I start a career in Computer Vision?

This post has been inspired by a question someone asked in a forum. This person was a new but competent programmer who was trying to move into Computer Vision (CV) in the industry. However, he rightly noticed that “most of the job requirements [in computer vision] are asking for a PhD”.

Indeed, this is true. And because of this, finding a job in computer vision in the industry is very difficult. Mainly because computer vision (e.g. video analytics) hasn’t caught on yet as a meaningful source of data or income for companies. I think it will sooner or later (more on this in a future post) but the reality is that such jobs are rare and hence why companies can afford to advertise and be picky for people with a PhD.

So, if you’re in a situation where you can’t land a job in CV and especially if you’re without a PhD, don’t give up. Computer vision is a fascinating field to work in and it will only get bigger with time. It’s worth fighting on.

Here are three things you can do to improve your chances of landing that job that you really want.

Read Books, Tutorials, Publications, and Blogs

This is obvious but needs to be said. Keep reading up on the field, keep fine-tuning your skills and knowledge in CV. You need to show your potential employer that you know the field of CV exceptionally well. Read important books that are or have been published. Some you can find in your library, some come in PDF format. For example, Neural Networks and Deep Learning is a great book on the hot topic of neural networks that is available online free of charge.

Work through the tutorials available on the OpenCV page. There’s plenty there on machine learning, photo processing, object detection, etc. to keep you busy for months! The idea is to get so good at CV to be able to instantly see a solution to an image/video processing problem. You need to shine at those job interviews.

Follow blogs on Computer Vision. Two of my favourite are PyImageSearch and Learn OpenCV. These guys are regularly posting stuff that will fascinate anybody with a passion for Computer Vision. In fact, PyImageSearch is so well-written that it put me off from starting this blog for a while.

Consider also looking into academic publications. These can be daunting, especially if you don’t have a background in research. But focus initially on the seminal papers (more on this in a future post) and try to get the gist of what the scientists are saying. You can usually pick up small bits and pieces here and there and implement simplified versions of them.

Side Projects

One thing that did get me a lot of attention were my side projects. Side projects show people where your passions lie. And passion is something that a lot of companies are looking for. Believe me, if you came to my company and I was asked to interview you to join the Computer Vision team, your side projects would be one of the first things I’d be looking at.

So, get stuck into a few of these to show that you love the area and you do this kind of stuff for fun. Get a Raspberry PI going with a camera and build your own security system via motion detection, for example. Or get a drone for your Raspberry Pi and camera and be creative with it. Then list these side projects at the end of your CV. If you’re truly passionate about Computer Vision, you will get noticed sooner or later.

Branch out into other areas of Artificial Intelligence

What I decided to do when my job hunting wasn’t going too well was to aim for jobs in other areas of AI rather than just Computer Vision. It involved me having to pick up additional knowledge in fields I wasn’t too familiar with (e.g. Robotic Process Automation) but the amount of jobs in these areas is much larger. This tactic proved successful for me. I ended up picking a company (yes, I was spoilt for choice in the end!) that had interesting clients and it was only a matter of time before opportunities for computer vision projects came along that we were all pushing for.

An Inspirational Story

If you are feeling down about your job searching or if you’re wondering whether CV is a viable place to aim for in the job market, here is a truly inspirational story out of India (from PyImageSearch – I told you it was a well-written blog!). It’s about a fellow who really wanted to work in CV but was at a disadvantage because he came from a low-income family. But he didn’t give up and put the hard work in. Today he is working on AI solutions for drones for a company in India. Moreover, with his salary he can support his family, has paid off all his debts, and is working in a field he absolutely loves!


Finding a job in Computer Vision is difficult. Most companies are advertising for people with a PhD. But there are things you can do to boost your chances of landing that job you really want. For example, you can keep your CV skills sharp by continually reading up on the subject. You can also work on side projects in CV and you can try branching out into other areas of AI to broaden the scope of projects you are qualified for. Don’t give up on your quest because CV is a field that is going to grow – it’s only a matter of time before the industry catches on to the amazing things CV can do with their video data.

To be informed when new content like this is posted, subscribe to the mailing list:

Please share what you just read:

Finding a Good Thesis Topic in Computer Vision

“What are some good thesis topics in Computer Vision?”

This is a common question that people ask in forums – and it’s an important question to ask for two reasons:

  1. There’s nothing worse than starting over in research because the path you decided to take turned out to be a dead end.
  2. There’s also nothing worse than being stuck with a generally good topic but one that doesn’t interest you at all. A “good” thesis topic has to be one that interests you and will keep you involved and stimulated for as long as possible.

For these reasons, it’s best to do as much research as you can to avoid the above pitfalls or your days of research will slowly become torturous for you – and that would be a shame because computer vision can truly be a lot of fun 🙂

So, down to business.

The purpose of this post is to propose ways to find that one perfect topic that will keep you engaged for months (or years) to come – and something you’ll be proud to talk about amongst friends and family.

I’ll start the discussion off by saying that your search strategy for topics depends entirely on whether you’re preparing for a Master’s thesis or a PhD. The former can be more general, the latter is (nearly always) very fine-grained specific. Let’s start with undergraduate topics first.

Undergraduate Studies

I’ll propose here three steps you can take to assist in your search: looking at the applications of computer vision, examining the OpenCV library, and talking to potential supervisors.

Applications of Computer Vision

Computer Vision has so many uses in the world. Why not look through a comprehensive list of them and see if anything on that list draws you in? Here’s one such list I collected from the British Machine Vision Association:

  • agriculture
  • augmented reality
  • autonomous vehicles (big one nowadays!)
  • biometrics
  • character recognition
  • forensics
  • industrial quality inspection
  • face recognition
  • gesture analysis
  • geoscience
  • image restoration
  • medical image analysis
  • pollution monitoring
  • process control
  • remote sensing
  • robotics (e.g. navigation)
  • security and surveillance
  • transport

Go through this list and work out if something stands out for you. Perhaps your family is involved in agriculture? Look up how computer vision is helping in this field! The Economist wrote a fascinating article entitled The Future of Agriculturein which they discuss, among other things, the use of drones to monitor crops, create contour maps of fields, etc. Perhaps Computer Vision can assist with some of these tasks? Look into this!


OpenCV is the best library out there for image and video processing (I’ll be writing a lot more about it on this blog). Other libraries do exist that do certain specific things a little better, e.g. Tracking.js, which performs things like tracking inside the browser, but generally speaking, there’s nothing better than OpenCV.

On the topic of searching for thesis topics, I recall once reading a suggestion of going through the functions that OpenCV has to offer and seeing if anything sticks out at you there. A brilliant idea. Work down the list of the OpenCV documentation. Perhaps face recognition interests you? There are so many interesting projects where this can be utilised!

Talk to potential supervisors

You can’t go past this suggestion. Every academic has ideas constantly buzzing around his head. Academics are immersed in their field of research and are always to talking to people in the industry to look for interesting projects that they could get funding for. Go and talk to the academics at your university that are involved in Computer Vision. I’m sure they’ll have at least one project proposal ready to go for you.

You should also run any ideas of yours past them that may have emerged from the two previous steps. Or at least mention things that stood out for you (e.g. agriculture). They may be able to come up with something themselves.

PhD Studies

Well, if you’ve reached this far in your studies then chances are you have a fairly good idea of how this all works now. I won’t patronise you too much, then. But I will mention three points that I wish someone had told me prior to starting my PhD adventure:

  • You should be building your research topic around a supervisor. They’ve been in the field for a long time and know where the niches and dead ends are. Use their experience! If there’s a supervisor who is constantly publishing in object tracking, then doing research with them in this area makes sense.
  • If your supervisor has a ready-made topic for you, CONSIDER TAKING IT. I can’t stress this enough. Usually the first year of your PhD involves you searching (often blindly) around various fields in Computer Vision and then just going deeper and deeper into one specific area to find a niche. If your supervisor has a topic on hand for you, this means that you are already one year ahead of the crowd. And that means one year saved of frustration because searching around in a vast realm of publications can be daunting – believe me, I’ve been there.
  • Avoid going into trending topics. For example, object recognition using Convolutional Neural Networks is a topic that currently everyone is going crazy about in the world of Computer Vision. This means that in your studies, you will be competing for publications with big players (e.g. Google) who have money, manpower, and computer power at their disposal. You don’t want to enter into this war unless you are confident that your supervisor knows what they’re doing and/or your university has the capabilities to play in this big league also.


Spending time looking for a thesis topic is time worth spending. It could save you from future pitfalls. With respect to undergraduate thesis topics looking at Computer Vision applications is one place to start. The OpenCV library is another. And talking to potential supervisors at your university is also a good idea.

With respect to PhD thesis topics, it’s important to take into consideration what the fields of expertise of your potential supervisors are and then searching for topics in these areas. If these supervisors have ready-made topics for you, it is worth considering them to save you a lot of time and stress in the first year or so of your studies. Finally, it’s usually good to avoid trending topics because of the people you will be competing against for publications.

But the bottom line is, devote time to finding a topic that truly interests you. It’ll be the difference between wanting to get out of bed to do more and more research in your field or dreading each time you have to walk into your Computer Science building in the morning.


To be informed when new content like this is posted, subscribe to the mailing list:

Please share what you just read: