Recent Controversies in Computer Vision

Computer vision is a fascinating area in which to work and perform research. And, as I’ve mentioned a few times already, it’s been a pleasure to witness its phenomenal growth, especially in the last few years. However, as with pretty much anything in the world, contention also plays a part in its existence.

In this post I would like to present 2 very recent events from the world of computer vision that have recently caused controversy:

  1. A judge’s ruling that Facebook must stand trial for its facial recognition software
  2. Uber’s autonomous car death of a pedestrian

Facebook and Facial Recognition

This is an event that has seemingly passed under the radar – at least for me it did. Probably because of the Facebook-Cambridge Analytica scandal that has been recently flooding the news and social discussions. But I think this is also an important event to mull over because it touches upon underlying issues associated with an important topic: facial recognition and privacy.

So, what has happened?

In 2015, Facebook was hit with a class action lawsuit (the original can be found here) by three residents from Chicago, Illinois. They are accusing Facebook of violating the state’s biometric privacy laws by the firm collecting and storing biometric data of each user’s face. This data is being stored without written notification. Moreover, it is not clear exactly what the data is to be used for, nor how long it will reside in storage, nor was there an opt-out option ever provided.

Facebook began to collect this data, as the lawsuit states, in a “purported attempt to make the process of tagging friends easier”.


In other words, what Facebook is doing (yes, even now) is summarising the geometry of your face with certain parameters (e.g. distance between eyes, shape of chin, etc.). This data is then used to try to locate your face elsewhere to provide tag suggestions. But for this to be possible, the biometric data needs to be stored somewhere for it to be recalled when needed.

The Illinois residents are not happy that a firm is doing this without their knowledge or consent. Considering the Cambridge Analytica scandal, they kind of have a point, you would think? Who knows where this data could end up. They are suing for $75,000 and have requested a jury trial.

Anyway, Facebook protested over this lawsuit and asked that it be thrown out of court stating that the law in question does not cover its tagging suggestion feature. A year ago, a District Judge rejected Facebook’s appeal.

Facebook appealed again stating that proof of actual injury needs to be shown. Wow! As if violating privacy isn’t injurious enough!?

But on the 14th May, the same judge discarded (official ruling here) Facebook’s appeal:

[It’s up to a jury] to resolve the genuine factual disputes surrounding facial scanning and the recognition technology.

So, it looks like Facebook will be facing the jury on July 9th this year! Huge news, in my opinion. Even if any verdict will only pertain to the United States. There is still so much that needs to be done to protect our data but at least things seem to be finally moving in the right direction.

Uber’s Autonomous Car Death of Pedestrian

You probably heard on the news that on March 18th this year a woman was hit by an autonomous car owned by Uber in Arizona as she was crossing the road. She died in hospital shortly after the collision. This is believed to be the first ever fatality of a pedestrian in which an autonomous car was involved. There have been other deaths in the past (3 in total) but all of them have been of the driver.

(image taken from the US NTSB report)

3 weeks ago the US National Transportation Safety Board (NTSB) released its first report (short read) into this crash. It was only a preliminary report but it provides enough information to state that the self-driving system was at least partially at fault.

(image taken from the US NTSB report)

The report gives the timeline of events: the pedestrian was detected about 6 seconds before impact but the system had trouble identifying it. It was first classified as an unknown object, then a vehicle, then a bicycle – but even then it couldn’t work out the object’s direction of travel. At 1.3 seconds before impact, the system realised that it needed to engage an emergency braking maneuver but this maneuver had been earlier disabled to prevent erratic vehicle behaviour on the roads. Moreover, the system was not designed to alert the driver in such situations. The driver began braking less than 1 second before impact but it was tragically too late.

Bottom line is, if the self-driving system had immediately recognised the object as a pedestrian walking directly into its path, it would have known that avoidance measures would have needed to be taken – well before the emergency braking maneuver was called to be engaged. This is a deficiency of the artificial intelligence implemented in the car’s system. 

No statement has been made with respect to who is legally at fault. I’m no expert but it seems like Uber will be given the all-clear: the pedestrian had hard drugs detected in her blood and was crossing in a non-crossing designated area of the road.

Nonetheless, this is a significant event for AI and computer vision (that plays a pivotal role in self-driving cars) because if these had performed better, the crash would have been avoided (as researchers have shown).

Big ethical questions are being taken seriously. For example, who will be held accountable if a fatal crash is deemed to be the fault of the autonomous car? The car manufacturer? The people behind the algorithms? One sole programmer who messed up a for-loop? Stanford scholars have been openly discussing the ethics behind autonomous cars for a long time (it’s an interesting read, if you have the time).

And what will be the future for autonomous cars in the aftermath of this event? Will their inevitable delivery into everyday use be pushed back?

Testing of autonomous cars has been halted by Uber in North America. Toyota has followed suit. And Chris Jones who leads the Autonomous Vehicle Analysis service at the technology analyst company Canalys, says that these events will set the industry back considerably:

It has put the industry back. It’s one step forward, two steps back when something like this happens… and it seriously undermines trust in the technology.

Furthermore, a former US Secretary of Transportation has deemed the crash a “wake up call to the entire [autonomous vehicle] industry and government to put a high priority on safety.”

But other news reports seem to indicate a different story.

Volvo, the make of car that Uber was driving in the fatal car crash, stated only last week that they expect a third of their cars sold to be autonomous by 2025. Other car manufacturers are making similar announcements. Two weeks ago General Motors and Fiat Chrysler unveiled self-driving deals with people like Google to push for a lead in the self-driving car market.

And Baidu (China’s Google, so to speak) is heavily invested in the game, too. Even Chris Jones is admitting that for them this is a race:

The Chinese companies involved in this are treating it as a race. And that’s worrying. Because a company like Baidu – the Google of China – has a very aggressive plan and will try to do things as fast as it can.

And when you have a race among large corporations, there isn’t much that is going to even slightly postpone anything. That’s been my experience in the industry anyway.


In this post I looked at 2 very recent events from the world of computer vision that have recently caused controversy.

The first was a judge’s ruling in the United States that Facebook must stand trial for its facial recognition software. Facebook is being accused of violating the Illinois’ biometric privacy laws by collecting and storing biometric data of each user’s face. This data is being stored without written notification. Moreover, it is not clear exactly what the data is being used for, nor how long it is going to reside in storage, nor was there an opt-out option ever provided.

The second event was the first recorded death of a pedestrian by an autonomous car in March of this year. A preliminary report was released by the US National Transportation Safety Board 3 weeks ago that states that AI is at least partially at fault for the crash. Debate over the ethical issues inherent to autonomous cars has heated up as a result but it seems as though the incident has not held up the race to bring self-driving cars onto our streets.


To be informed when new content like this is posted, subscribe to the mailing list:

Please share what you just read:

Computer Vision on Mars

I was doing my daily trawl of the internet a few days ago looking at the latest news in artificial intelligence (especially computer vision) and an image caught my eye. The image was of one of the Mars Exploration Rovers (MER) that landed on the Red Planet in 2004. Upon seeing the image I thought to myself: “Heck, those rovers must surely have used computer vision up there!?” So, I spent the day looking into this and, sure as can be, not only was computer vision used by these rovers, it in fact played an integral part in their missions.

In this post, then, I’m going to present to you how, where, and when computer vision was used by those MERs. It’s been a fascinating few days for me researching into this and I’m certain you’ll find this an interesting read also. I won’t go into too much detail here but I’ll give you enough to come to appreciate just how neat and important computer vision can be.

If you would like to read more about this topic, a good place to start is “Computer Vision on Mars” (Matthies, Larry, et al. International Journal of Computer Vision 75.1, 2007: 67-92.), which is an academic paper published by NASA in 2007. You can also follow any additional referenced publications there. All images in this post, unless otherwise stated, were taken from this paper.

Background Information

In 2003, NASA launched two rovers into space with the intention of landing them on Mars to study rocks and soils for traces of past water activity. MER followed upon three other rover-based missions: the two Viking missions of 1975 and 1976 and the Mars Pathfinder mission of 1997.

Due to constraints in processing power and memory capacity no image processing was performed by the Viking rovers. They only took pictures with their on-board cameras to be sent back to Earth.

The Sojourner (the name of the Mars Pathfinder rover), on the other hand, performed computer vision in one way only. It used stereoscopic vision to provide scientists detailed maps of the terrain around the rover for operators on Earth to use in planning movement trajectories. Stereoscopic vision provides visual information from two viewing angles a short distance apart just like our eyes do. This kind of vision is important because two views of the same scene allows for the extraction of 3D data (i.e. depth data). See this OpenCV tutorial on extracting depth maps from stereo images for more information on this.

The MER Rovers

The MER rovers, Spirit and Opportunity as they were named, were identical. Both had a 20 MHz processor, 128 MB of RAM, and 256 MB of flash memory. Not much to work with there, as you can see! Phones nowadays are about 1000 times more powerful.

The rovers also had a monocular descent camera facing directly down and three sets of stereo camera pairs: one pair each at the front and back of the rovers (called hazard cameras, or “hazcams” for short) and a pair of cameras (called navigation cameras, or “navcams” for short) on a mast 1.3m (4.3 feet) above the ground. All these cameras took 1024 x 1024 greyscale photos.

But wait, those colour photos we’ve seen so many times from these missions were fake, then? Nope! Cleverly, each of the stereoscopic camera lenses also had a wheel of 8 filters that could be rotated. Consecutive images could be taken with a different filter (e.g. infrared, ultra-violet, etc.) and colour extracted from a combination of these. Colour extraction was only done on Earth, however. All computer vision processing on Mars was therefore performed in greyscale. Fascinating, isn’t it?

Components of an MER rover (image source)

The Importance of Computer Vision in Space

If you’ve been around computer vision for a while you’ll know that for things such as autonomous vehicles, vision solutions are not necessarily the most efficient. For example, lidar (Light Detection And Ranging – a technique similar to sonar for constructing 3D representations of scenes by emitting pulsating laser light and then measuring reflections of it) can give you 3D obstacle avoidance/detection information much more easily and quickly. So, why did NASA choose to use computer vision (and so much of it, as I’ll be presenting to you below) instead of other solutions? Because laser equipment is fragile and it may not have withstood the harsh conditions of Mars. So, digital cameras were chosen instead.

Computer Vision on Mars

We now have information on the background of the mission and the technical hardware relevant to us so let’s move to the business side of things: computer vision.

The first thing I will talk about is the importance of autonomy in space exploration. Due to communication latency and bandwidth limitations, it is advantageous to minimise human intervention by allowing vehicles or spacecraft to make decisions on their own. The Sojourner had minimal autonomy and only ended up travelling approximately 100 metres (328 feet) during it’s entire mission (which lasted a good few months). NASA wanted the MER rovers to travel on average that much every day, so they put a lot of time and research into autonomy to help them reach this target.

In this respect, the result was that they used computer vision for autonomy on Mars in 3 ways:

  1. Descent motion estimation
  2. Obstacle detection for navigation
  3. Visual odometry

I will talk about each of these below. As mentioned in the introduction, I won’t go into great detail here but I’ll give you enough to satisfy that inner nerd in you 😛

1. Descent Image Motion Estimation System

Two years before the launch of the rocket that was to take the rovers to Mars, scientists realised that their estimates of near-surface wind velocities of the planet were too low. This could have proven catastrophic because severe horizontal winds could have caused irreparable damage upon an ill-judged landing of the rover. Spirit and Opportunity had horizontal impulse rockets that could be used to reduce horizontal velocity upon descent but no system to detect actual horizontal speed of the rovers.

Since a regular horizontal velocity sensor could not be installed due to cost and time constraints, it was decided to turn to computer vision for assistance! A monocular camera was attached to the base of the rover that would take pictures of the surface of the planet as the rovers were descending onto it. These pictures would be analysed in-flight to provide estimates of horizontal speeds in order to trigger the impulse rockets, if necessary.

The computer vision system for motion estimation worked by tracking a single feature (features are small “interesting” or “stand-out” patches in images). The feature was located in photos taken by the rovers and then the position of these patches was tracked between consecutive images.

Coupled with this feature tracking information and measurements from the angular velocity and vertical velocity sensors (that were already installed for the purpose of on-surface navigation), the entire velocity vector (i.e. information about the magnitude and direction of the rover’s speed) was able to be calculated.

The feature tracking algorithm, called the Descent Image Motion Estimation System (DIMES) consisted of 7 steps as summarised by the following image:


The first step reduces the image size to 256 x 256 resolution. The smaller the resolution, the faster that image processing calculations can be performed – but at the possible expense of accuracy. The second step was responsible for estimating the maximum possible area of overlap in consecutive images to minimise the search area for features (there’s no point in detecting features in regions of an image that you know are not going to be present in the second). This was done by taking into consideration knowledge from sensors of things such as the rover’s altitude and orientation. The third step picked out two features from an image using the Harris corner detector (discussed here in this OpenCV tutorial). Only one feature is needed for the algorithm to work but two were detected in case one feature could not be located in the following image. A few noise “clean-up” operations on images were performed in step 4 to reduce effects of things such as blurring.

Step 5 is interesting. The feature patches (aka feature templates) and search windows in consecutive images were rectified (rotated, twisted, etc.) to remove orientation and scale differences in order to make searching for features easier. In other words, the images were rotated, twisted and enlarged/diminished to be placed on the same plane. An example of this from the actual mission (from the Spirit rover’s descent) is shown in the image below. The red squares in the first image are the detected feature patches that are shown in green in the second image with the search windows shown in blue. You can see how the first and second images have been twisted and rotated such that the feature size, for example, is the same in both images.


Step 6 was responsible for locating in the second image the two features found in the first image. Moravec’s correlator (an algorithm developed by Hans Moravec and published in his PhD thesis way back in 1980) was used for this. The general idea in this algorithm is to minimise the search area first instead of searching over every possible location in an image for a match. This is done by first selecting potential regions in an image for matches and only there is a more exhaustive search performed.

The final step is combining all this information to calculate the velocity vector. In total, the DIMES algorithm took 14 seconds to run up there in the atmosphere of Mars. It was run by both rovers during their descent. The Spirit rover was the only one that fired its impulse rockets as a result of calculations from DIMES. Its horizontal velocity was at one stage reduced from 23.5 m/s (deemed to be slightly over a safe limit) to 11 m/s, which ensured a safe landing. Computer vision to the rescue! Opportunity’s horizontal speed was never calculated to be too fast so firing its stabilising rockets was considered to be unnecessary. It also had a successful landing.

All the above steps were performed autonomously on Mars without any human intervention. 

2. Stereo Vision for Navigation

To give the MER rovers as much autonomy as possible, NASA scientists developed a stereo-vision-based obstacle detection and navigation system. The idea behind it was to give the scientists the ability to simply provide the rovers each day with a destination and for the vehicles to work things out on their own with respect to navigation to this target (e.g. to avoid large rocks).

And their system performed beautifully.

The algorithm worked by extracting disparity (depth) maps from stereo images – as I’ve already mentioned, see this OpenCV tutorial for more information on this technique. What was done, however, by the rovers was slightly different to that tutorial (for example a simpler feature matching algorithm was employed), but the gist of it was the same: feature point detection and matching was performed to find the relationship between images and knowledge of camera properties such as focal lengths and baseline distances allowed for the derivation of depth for all pixels in an image. An example of depth maps calculated in this way by the Spirit rover is shown below:

The middle picture was taken by Spirit and shows a rock on Mars approximately 0.5 m (1.6 feet) in height. The left image shows corresponding range information (red is closest, blue furthest). The right image shows corresponding height information.

Interestingly, the Opportunity rover, because it landed on a smoothly-surfaced plain, was forced to use its navcams (that were mounted on a mast) for its navigation. Looking down from a higher angle meant that detailed texture from the sand could be used for feature detection and matching. Its hazcams returned only the smooth surface of the sand. Smooth surfaces are not agreeable to feature detection (because, for example, they don’t have corners or edges). The Spirit rover, on the other hand, because it landed in a crater full of rocks, could use its hazcams for stereoscopic navigation.

3. Visual Odometry

Finally, computer vision on Mars was used at certain times to estimate the rovers’ position and travelling distance. No GPS is available on Mars (yet) and standard means of estimating distance travelled such as counting the number of wheel rotations was deemed during desert testing on Earth to be vulnerable to significant error due to one thing: wheel slippage. So, NASA scientists decided to employ motion estimation via computer vision instead.

Motion estimation was performed using feature tracking in 3D across successive shots taken by the navcams. To obtain 3D information, once again depth maps were extracted from stereoscopic images. Distances to features could easily be calculated from these and then the rovers’ poses were estimated. On average, 80 features were tracked per frame and a photo was taken for visual odometry calculations every 75 cm (30 inches) of travel.

Using computer vision to assist in motion estimation proved to be a wise decision because wheel slippage was quite severe on Mars. In fact, at one time the rover got stuck in sand and the wheels rotated in place for the equivalent of 50m (164 feet) of driving distance. Without computer vision the rovers’ estimated positions would have been severely inaccurate. 

There was another instance where this was strikingly the case. At one time the Opportunity rover was operating on a 17-20 degree slope in a crater and was attempting to maneuver around a large rock. It had been trying to escape the rock for several days and had slid down the crater many times in the process. The image below shows the rover’s estimated trajectory (from a top-down view) using just wheel odometry (left), and the rover’s corrected trajectory (right) as assisted by computer vision calculations. The large rock is represented by the black ellipse. The corrected trajectory proved to be the more accurate estimation.



In this post I presented the three ways computer vision was used by the Spirit and Opportunity rovers during their MER missions on Mars. These three ways were:

  1. Estimating horizontal speeds during their descent onto the Red Planet to ensure the rovers had a smooth landing.
  2. Extracting 3D information of its surroundings using stereoscopic imagery to assist in navigation and obstacle detection.
  3. Using stereoscopic imagery once again but this time to provide motion and pose estimation on difficult terrain.

In this way, computer vision gave the rovers a significant amount of autonomy (much, much more autonomy than its predecessor, the Sojourner rover) that ultimately gave the rovers a safe landing and allowed the robots to traverse up to 370 m (1213 feet) per day. In fact, the Opportunity rover is still active on Mars now. This means that the computer vision techniques described in this post are churning away as we speak. If that isn’t neat, I don’t know what is!


To be informed when new content like this is posted, subscribe to the mailing list:

Please share what you just read: