Zbigatron | Zbigatron

Capturing the Moment in Photography – SIGGRAPH 2019 Award

Posted on July 31, 2019January 16, 2025 by Zbigatron

SIGGRAPH 2019 is coming to end today. SIGGRAPH, which stands for “Special Interest Group on Computer GRAPHics and Interactive Techniques”, is a world-renowned annual conference held predominantly for computer graphics researchers – but you do sometimes get papers from the world of computer vision being published there. In fact, I’ve presented a few such papers on this blog in the past (e.g. see here).

I’m not going to present any papers from this conference today. What I would like to do is mention a person who is being recognised at this year’s conference with a special award. Michael F. Cohen, the current Director of Facebook’s Computational Photography Research team, a few days ago received the 2019 Steven A. Coons Award for Outstanding Creative Contributions to Computer Graphics. This is an award given every two years to honour outstanding lifetime contributions to computer graphics and interactive techniques.

For the full, very impressive list of Michael’s achievements, see the SIGGRAPH award’s page. But there are a few that stand out. In particular his significant contributions to Facebook’s 3D photos feature and most interestingly (for me) his work on The Moment Camera.

You may recall that in March of this year I wrote about Smartphone Camera Technology from Google and Nokia. At the time, I didn’t realise that the foundations for the technologies I discussed there were laid down by Michael nearly 15 years ago.

In that post I talked about High Dynamic Range (HDR) Imaging, which is a technique employed by some cameras to give you better quality photos. The basic idea behind HDR is to capture additional shots of the same scene (at different exposure levels, for instance) and then take what’s best out of each photo to create a single picture. For example, the image on the right below was created by a Google phone using a single camera and HDR technology. A quick succession of 10 photos (called an image burst) was taken of a dimly lit indoor scene. The final merged picture gives a vivid representation of the scene. Quite astonishing, really.

(image taken from here)

Well, Michael F. Cohen, laid out the basic ideas behind HDR for combining images/photos to create better pictures at the beginning of this century. For example, he along with Richard Szeliski published this fantastic paper in 2006. In it he talks about the idea of capturing a moment rather than an image. Capturing a moment is a much better description of what HDR is all about!

The abstract to the paper says it best:

Future cameras will let us “capture the moment,” not just the instant when the shutter opens. The moment camera will gather significantly more data than is needed for a single image. This data, coupled with automated and user-assisted algorithms, will provide powerful new paradigms for image making.

Ah, the moment camera. What a good name for HDR-capable phones!

It’s interesting to note that it has taken a long time for the moment camera to become available to the general public. I would guess that we just had to wait for faster CPUs on our phones for Michael’s work to become a reality. However, some features of the “moment camera” described in the 2006 paper are yet to be implemented in our HDR-enabled phones. For example, this idea of a group shot being improved by image segmentation:

capturing-moment-group-shot — The original caption to the image reads: “Working with stored images, the user indicates when each person photographed looks best. The system automatically finds the best regions around each selection to compose into a final group shot.” (image taken from original publication)

Anyway, a well-deserved lifetime achievement award, Michael. And thank you for the “moment camera”.

To be informed when new content like this is posted, subscribe to the mailing list (or subscribe to my YouTube channel!):

Smart Glasses for the Blind – Why has it Taken This Long?

Posted on May 18, 2019January 16, 2025 by Zbigatron

Remember Google Glass? Those smart glasses that were released by Google to the public in May of 2014 (see image below). Less than a year later production was halted because, well, not many people wanted to walk around with a goofy looking pair of specs on their noses. They really did look wacky. I’m not surprised the gadget never caught on.

Well, despite the (predictable?) flop of Google Glass, it turns out that there has proven to be a fantastic use case for such smart glasses: for people with visual impairments.

There is a company out there called Aira that provides an AI-guided service used in conjunction with smart glasses and an app on a smartphone. When images are captured by the glasses’ forward-facing camera, image and text recognition are used and an AI assistant, dubbed “Chloe”, describes in speech what is present in these videos: whether it be everyday objects such as products on a shelf in your pantry, words on medication bottles or even words in a book.

Quite amazing, isn’t it?

Simple tasks like object and text recognition are performed locally on the smartphone. However, more complex tasks can be sent to Aira’s cloud services (powered by Amazon’s AWS).

Furthermore, the user has the option to, at the tap of a button on the glasses or app, connect to a live agent who is then able to access a live video stream from the smart glasses and other data from the smartphone like GPS location. With these the live agent is able to provide real-time assistance by speaking directly to the visually impaired person. A fantastic idea.

According to NVIDIA, Aira trains its object recognition deep learning neural networks not on image datasets like ImageNet but from 3 million minutes worth of data captured by their users, which has been annotated by Aira’s agents. An interesting idea considering how time consuming such a task must have been. But this has given the service an edge as training from real-world scenarios has provided, as reported, better results.

The uses for Aira’s product and service are pretty much endless! As suggested on their site, you can use Aira for things like: reading to a child, locating a stadium seat, reading a whiteboard, navigating premises, sorting and reading mail and the paper, enjoying the park or the zoo, roaming historical sites. Gosh, the list can be endless!

And thankfully, the glasses don’t look goofy at all! That’s definitely a win right there.

Finally, I would encourage you to take a look at this official video demonstrating the uses of Aira. This is computer vision serving society in the right way.

(Unfortunately, the video that was once here has been taken down by the pubisher)

To be informed when new content like this is posted, subscribe to the mailing list (or subscribe to my YouTube channel!):

Delivery Drones and the Google Wing Project

Posted on April 15, 2019January 16, 2025 by Zbigatron

I gave a guest lecture last Thursday at Carnegie Mellon University at their Adelaide campus in South Australia. (A special shout-out to the fantastic students that I met there!). The talk was on the recent growth of computer vision (CV) in the industry. At the end of the presentation I showed the students some really interesting projects that are being worked on today in the CV domain such as Amazon Go, Soccer/Football on Your Tabletop, autonomous cars (which I am yet to write about), CV in the fashion industry, and the like.

I missed one project, however, that has been making news in the past few days in Australia: delivery drones. Three days ago, Google announced that it is officially launching the first home delivery drone service in Australia in our capital city, Canberra, to deliver takeaway food, coffee, and medicines. Google Wing is the name of the project behind all this.

Big, big news, especially for computer vision.

In this post I am going to look at the story behind this. I will present:

the benefits of delivery drones,
the potential drawbacks of them,
and then I’ll take a look at (as much as is possible) the technology behind Google’s drones.

The Benefits of Delivery Drones

There was an official report prepared two months ago by AlphaBeta Advisors on behalf of Google Wing for the Standing Committee on Economic Development at the Parliament of the Australian Capital Territory (Canberra). The report, entitled “Inquiry into drone delivery systems in the ACT“, analysed the benefits of delivery drones in order to sway the government to give permission for drones to be utilised in this city for the purposes described above. The report was successful since, as I’ve mentioned, the requested permission was granted a few days ago.

Let’s take a look (in summary) at the benefits discussed by the article. Note that numbers presented here are specific to Canberra.

Benefits for local businesses:

More households can be brought into range by delivery drones. More households means more consumers.
Reduction of delivery costs. It is estimated that delivery costs could fall by up to 80-90% in the long term.
Lower costs will generate more sales.
More businesses delivering means a more competitive market.

Benefits for consumers:

Drones will be able to reach the more underserved members of the public such as the elderly, disabled, and homebound.
Since delivery times are faster by 60-70%, it is estimated that 3 million hours will be saved per year. This includes scenarios where customer pick-up journeys are replaced by drones.
As a result of lower delivery costs, drones could save households $5 million in fees per year.
Product variety will be expanded for the consumer as up to 4 times more merchants could be brought into range for them.

Benefits for society:

35 million km per year will be removed as a result of more delivery vehicles being taken off the road. This will reduce traffic congestion.
The above benefit will also result in a reduction of emissions by 8,000 tonnes, which is equivalent to the carbon storage of 250,000 trees (huge!).
Fewer cars on the road means fewer road accidents.

Some convincing arguments here. The benefits to society are my personal favourites. I hate traffic congestion!

The Potential Drawbacks of Delivery Drones

Drawbacks are not discussed in the aforementioned report. But some have been raised by the public living in Canberra. These are definitely worth mulling over:

Noise pollution. Ever since 2014 when Google started testing these delivery drones people have complained about how noisy they are. Some have even mentioned that wildlife seems to have disappeared from delivery areas as a result of this noise pollution. In fact, residents from this area have created an action group, called Bonython Against Drones, “to raise awareness of the negative impact of the drone delivery trial on people, pets and wildlife in Bonython [a suburb in Canberra] and to ensure governance and appropriate legislative orders are in place to protect the community“. Below is a video of a delivery in progress. Bonython Against Drones appears to have a strong case. This noise really is irritating.
Invasion of privacy. Could flying low over people’s properties be deemed as an invasion of privacy? A fair question to ask. Also, could Google use these drones to collect private information from the households they fly over? Of course, the company says that they comply with privacy laws and regulations but, well, their track record on privacy isn’t stellar. Heck, there’s even an entire Wikipedia article on the Privacy Concerns Regarding Google.
Bad weather conditions such as strong winds would render drones unusable. Can we rely on weather conditions so heavily?

The first point is definitely a drawback worth considering.

Google Wing Drones

Let’s take a look at the drones in operation in Canberra.

It seems as if this drone is a hybrid between a plane and a helicopter. The drone has wings with 2 large propellers but also 9 smaller hover propellers. Google says that the hover propellers are designed specifically to reduce noise. From the video above, though, a little bit more is probably needed to curtail that obnoxious buzzing sound.

There’s not much information out there on the technical side of things. For example, no white papers have been released by Google as of yet. But I dug around a bit and managed to come up with some interesting things. I stumbled upon this job description for the position of Perception Software Engineer at Google Wing HQ in California. What a find 🙂

(If you’re reading this post some time after April 2019, chances are the job description has been taken down… sorry about that)

The job description gives us hints as to what is going on in the background of this project. For example, we know that Google has developed “an unmanned traffic management platform–a kind of air traffic control for unmanned aircraft–to safely route drones through the sky”. Very cool.

More importantly for us, we also know that computer vision plays a prominent role in the guidance of these drones:

“Our perception solutions run on real-time embedded systems, and familiarity with computer vision, optical sensors, flight control, and simulation is a plus.”

And the job requirements specifically request 2 years of experience working with camera sensors for computer vision applications.

One interesting task that these drones perform is visual odometry, which is the process of determining the position and orientation of a device/vehicle by analysing camera images. As I’ve documented earlier, visual odometry was a CV technique used on Mars by the MER rovers from way back in the early 2000s.

It’s interesting to note that the CV techniques listed by the job description are performed on embedded systems and are coded in C++. A lot of people (including me) are predicting that embedded systems (e.g. IoTs, edge computing) are the next big thing for CV, so it’s worth taking note of this. Oh, and notice also that C++ is being used here. This language is not dead yet, despite it not being taught at universities any more. C++ is just damn fast – something that is a must in embedded CV solutions.

Summary

This post looked at some background information pertaining to the Google Wing project that, as of a few days ago, officially launched the first home delivery drone service in Australia’s capital city, Canberra. The first section of the post discussed the benefits and drawbacks of delivery drones. The last part of the post presented the Google Wing project from the technical side. Not much technical information is available on this project but a job description for the position of Perception Software Engineer gives us a sneak peek at the inner workings of Google Wing, especially from the perspective of computer vision.

It will be interesting to see whether delivery drones will be deemed a success by Google and also, most importantly, by the public of Canberra.

To be informed when new content like this is posted, subscribe to the mailing list (or subscribe to my YouTube channel!):

My Top 5 Posts So Far

Posted on March 20, 2019January 16, 2025 by Zbigatron

It’s been nearly 18 months since I started this blog. I did it to share my journey in computer vision with you. I love this field and I’m always stumbling across such fascinating things that I feel as though more people should know about them.

I’ve seen this blog grow in popularity – much, much more than I had anticipated when I first started it. In this little “bonus” post, I thought I’d list my top posts thus far with additional comments about them.

I also thought I’d compile a second list with my personal favourite posts. These have not been as popular but I sure as hell had fun writing them!

Enjoy! And thanks for the support over the last 18 months.

My top 5 posts thus far:

Why Deep Learning Has Not Superseded Traditional Computer Vision – I wrote this post on a Friday evening directly after work with a beer bottle in one hand and people playing pool or foosball around me. I wrote it up in an hour or so and didn’t think much of it, to be honest. The next day I woke up and saw, to my extreme surprise, that it had gone slightly viral! It was featured in Deep Learning Weekly (Issue #76), was being reposted by people such as Dr Adrian Rosebrock from PyImageSearch, and was getting about 1000 hits/day. Not bad, hey!?
The Top Image Datasets and Their Challenges
Finding a Good Thesis Topic in Computer Vision – I wrote this post after constantly seeing people asking this question on forums. Considering it’s consistently in my top 3 posts every week, I guess people are still searching for inspiration.
Mapping Camera Coordinates to a 2D Floor Plan – This post came about after I had to work on security footage from a bank for a project at work. The boss was very pleased with what I had done and writing about my experiences in a post was a no-brainer after that.
The Early History of Computer Vision – History is something that really interests me so it was only a matter of time before I was going to read up on the history of computer vision. Once I did and saw how fascinating it was, I just had to write a post about it.

My favourite posts thus far:

Like I said, these are not popular (some barely get a single hit in a week) but I really enjoyed researching for and writing them.

Soccer on Your Tabletop – The coolest thing going around in computer vision.
Amazon Go – Computer Vision at the Forefront of Innovation – This to me is something amazing.
The Baidu and ImageNet Controversy – Nothing like a good controversy!
Computer Vision on Mars – Computer vision in space. Imagine working on that project!
The Growth of Computer Vision in the Industry / The Reasons Behind the Recent Growth of Computer Vision – I’m proud of how far computer vision has come over the years. It’s been a pleasure to be a part of the adventure.

Enjoy looking back over my posts. Thanks once again for your support over the last 18 months.

To be informed when new content like this is posted, subscribe to the mailing list (or subscribe to my YouTube channel!):

Smartphone Camera Technology from Google and Nokia

Posted on March 13, 2019January 16, 2025 by Zbigatron

A few days ago Nokia unveiled its new smartphone: the Nokia 9 PureView. It looks kind of weird (or maybe funky?) with its 5 cameras at its rear (see image above). But what’s interesting is how Nokia uses these 5 cameras to give you better quality photos with a technique called High Dynamic Range (HDR) imaging.

HDR has been around in smartphones for a while, though. In fact, Google has had this imaging technique available in some of its phones since at least 2014. And in my opinion it does a much better job than Nokia.

In this post I would like to discuss what HDR is and then present what Nokia and Google are doing with it to provide some truly amazing results. I will break the post up into the following sections:

High Dynamic Range Imaging (what it is)
The Nokie 9 PureView
Google’s HDR+ (some amazing results here)

High Dynamic Range Imaging

I’m sure you’ve attempted to take photos of high luminosity range scenarios such as dimly lit scenes or ones where the backdrop is brightly radiant. Frequently such photos come out either overexposed, underexposed and/or blurred. The foreground, for example, might be completely in shadow or details will be blurred out because it’s hard to keep the camera still when you have the shutter speed set to low to let in extra light.

HDR attempts to alleviate these high range scenario problems by capturing additional shots of the same scene (at different exposure levels, for instance) and then taking what’s best out of each photo and merging this into one picture.

Photo by Gustave Le Gray (image taken from Wikipedia)

Interestingly, the idea of taking multiple shots of a scene to provide a better single photo goes back to the 1850s. Gustave Le Gray, a highly noted French photographer, rendered seascapes showing both the sky and the sea by using one negative for the sky, and another one with a longer exposure for the sea. He then combined the two into one picture in the positive. Quite innovative for the period. The picture on the right was captured by him using the HDR technique.

The Nokia 9 PureView

As you’ve probably already guessed, Nokia uses the five cameras on the Nokia 9 PureView to take photos of the same scene. However, each camera is different. Two cameras are standard RGB sensors to capture colour. The remaining three are monochrome that capture nearly three times more light as the RGB cameras. These 5 cameras are each 12 megapixels in resolution. There is also an infrared sensor for depth readings.

Depending on the scene and lighting conditions each camera can be triggered up to four times in quick succession (commonly referred to as burst photography).

One colour photo is then selected to act as the primary shot and the other photos are used to improve it with details.

The final result is a photo of up to 240 megapixel in quality. Interestingly, you also have control over how much photo merging takes place and where this merging occurs. For example, you can choose to add additional detail to the foreground and ignore the background. The depth map from the depth sensor undoubtedly assists in this. And yes, you have access to all the RAW files taken by the cameras.

Not bad, but in my opinion Google does a much better job… and with only one camera. Read on!

Google’s HDR+

Google’s HDR technology is dubbed HDR+. It has been around for a while, first appearing in the Nexus 5 and 6 phones. It is now a standard on the Pixel range of phones. It is standard because HDR+ uses the regular single camera on Google’s phones.

It gets away with just using one camera by taking up to 10 photos in quick succession – much more than Nokia does. Although the megapixel quality of the resulting photos may not match Nokia’s, the results are nonetheless impressive. Just take a look at this:

google-hdr-eg — *(image taken from here)*

That is a dimly lit indoor scene. The final result is truly astonishing, isn’t it?

Here’s another example:

What makes HDR+ standout from the crowd is its academic background. This isn’t some black-box technology that we know nothing about – it’s a technology that has been peer-reviewed by world-class academics and published in a world-class conference (SIGGRAPH Asia 2016).

Moreover, only last month, Google released to the public a dataset of archives of image bursts to help improve this technology.

When Google does something, it (usually) does it with a bang. You have to love this. This is HDR imaging done right.

To be informed when new content like this is posted, subscribe to the mailing list (or subscribe to my YouTube channel!):

Google’s Dataset Search Engine – Find The Dataset You Need Here

Posted on February 27, 2019January 16, 2025 by Zbigatron

Did you know that there is now a search engine for datasets that is powered by Google? Well, there is! And it’s something that the research community and the industry have been needing (whether they knew it or not) for years now.

This new search engine is called Dataset Search and can be found at this link.

This is a big deal. Datasets have become crucial since the prominent arrival of deep learning onto the scene a few years ago. Deep learning needs data. Lots and lots of data. This is because in deep learning, neural networks are told to (more or less) autonomously discover the underlying patterns in data. In computer vision, for example, you would want a machine to learn that bicycles are composed of two wheels, a handlebar, and a seat. But you need to provide enough examples for a machine to be able to learn these patterns.

Creating such large datasets is not an easy task. Some of the top image datasets (as I have documented here), contain millions of hand annotated images. These are famous datasets that most people in the computer vision world know about. But what about datasets that are more niche and hence less known? Some of these can be very difficult to find – and you certainly would not want to spend months or years creating them only to find that someone had already gone to all the trouble before you.

Up until now, then, there was no central location to search for these datasets. You had to manually traverse the web in the hope of finding what you were looking for. But that was until Dataset Search came along! Thank the heavens for that. Although Dataset Search is still in its beta stage, this is definitely something the research and industry communities have been needing.

For datasets to be listed in a coherent and informative manner on Dataset Search, Google has developed guidelines for dataset providers. These guidelines are based on schema.org, which is an open standard for describing such information (in metadata tags). As Google states:

We encourage dataset providers, large and small, to adopt this common standard so that all datasets are part of this robust ecosystem.

It would be a good idea to start adhering to these guidelines when creating datasets because a central place of reference for datasets is something we all need.

As a side note, Dataset Search has been in development for at least three years (interestingly, Dataset Search’s previous name was actually Goods – Google Dataset Search). Google released two academic papers on this in 2016 – see here and here. It’s nice to see that their work has finally culminated into what they have offered us now.

Dataset Search is definitely a step in in the right direction.

To be informed when new content like this is posted, subscribe to the mailing list (or subscribe to my YouTube channel!):

Seeing Around Corners with a Laser

Posted on January 24, 2019January 16, 2025 by Zbigatron

In this post I would like to show you some results of another interesting paper I came across recently that was published last year in the prestigious Nature journal. It’s on the topic of non-line-of-sight (NLOS) imaging or, in other words, it’s about research that helps you see around corners. NLOS could be something particularly useful for use cases such as autonomous cars in the future.

I’ll break this post up into the following sections:

The LIDAR laser-mapping technology
LIDAR and NLOS
Current Research into NLOS

Let’s get cracking, then.

LIDAR

You may have heard of LIDAR (a term which combines “light” and “radar”). It is used very frequently as a tool to scan surroundings in 3D. It works similarly to radar but instead of emitting sound waves, it sends out pulses of infrared light and then calculates the time it takes for this light to return to the emitter. Closer objects will reflect this laser light quicker than distant objects. In this way, a 3D representation of the scene can be acquired, like this one which shows a home damaged by the 2011 Christchurch Earthquake:

LIDAR has been around for decades and I came across it very frequently in my past research work in computer vision, especially in the field of robotics. More recently, LIDAR has been experimented with in autonomous vehicles for obstacle detection and avoidance. It really is a great tool to acquire depth information of the scene.

NLOS Imaging

But what if where you want to see is obscured by an object? What if you want to see what’s behind a wall or what’s in front of the car in front of you? LIDAR does not, by default, allow you to do this:

lidar-eg-with-occlusion — *The rabbit object is not reachable by the LIDAR system (image adapted from this video)*

This is were the field of NLOS comes in.

The idea behind NLOS is to use sensors like LIDAR to bounce laser light off walls and then read back any reflected light.

lidar-eg-with-NLOS — *The laser is bounced off the wall to reach the object hidden behind the occluder (image adapted from this video)*

This process is repeated around a particular point (p in the image above) to obtain as much reflected light as possible. The reflected light is then analysed and any objects on the other side of the occlusion are attempted to be reconstructed.

This is still an open area of research with many assumptions (e.g. that light is not reflected multiple times by the occluded object but bounces straight back to the wall and then the sensors) but the work on this done so far is quite intriguing.

Current Research into NLOS

The paper that I came across is entitled “Confocal non-line-of-sight imaging based on the light-cone transform“. It was published in March of last year in the Nature journal (555, no. 7696, p. 338). Nature is one of the world’s top and most famous academic journals, so anything published there is more than just world-class – it’s unique and exceptional.

The experiment setup from this paper was as shown here:

nlos-experiment-setup — *The setup of the experiment for NLOS. The laser light is bounced off the white wall to hit and reflect off the hidden object (image taken from original publication)*

The idea, then, was to try and reconstruct anything placed behind the occluder by bouncing laser light off the white wall. In the paper, two objects were scrutinised: an “S” (as shown in the image above) and a road sign. With a novel method of reconstruction, the authors were able to obtain the following reconstructed 3D images of the two objects:

NLOS-results — *(image adapted from original publication)*

Remember, these results are obtained by bouncing light off a wall. Very interesting, isn’t it? What’s even more interesting is that the text on the street sign has been detected as well. Talk about precision! You can clearly see how one day, this could come in handy with autonomous cars who could use information such as this to increase safety on the roads.

A computer simulation was also created to ascertain with dexterity the error rates involved with the reconstruction process. The simulated setup was as shown in the above images with the bunny rabbit. The results of the simulation were as follows:

NLOS-simulation-results — *(image adapted from original publication)*

The green in the image is the reconstructed parts of the bunny superimposed on the original object. You can clearly see how the 3D shape and structure of the object is extremely well-preserved. Obviously, the parts of the bunny not visible to the laser could not be reconstructed.

Summary

This post introduced the field of non-line-of-sight imaging, which is, in a nutshell, research that helps you see around corners. The idea behind NLOS is to use sensors like LIDAR to bounce laser light off walls and then read back any reflected light. The scene behind an occlusion is then attempted to be reconstructed.

Recent results from state-of-the-art research in NLOS published in the Nature journal were also presented in this post. Although much more work is needed in this field, the results are quite impressive and show that NLOS could one day be very useful with, for example, autonomous cars who could use information such as this to increase safety on the roads.

To be informed when new content like this is posted, subscribe to the mailing list (or subscribe to my YouTube channel!):

Heart Rate Estimation Using Computer Vision

Posted on December 12, 2018January 16, 2025 by Zbigatron

This post is another that has been inspired by a forum question: “What are some lesser known use cases for computer vision?” I jumped at the opportunity to answer this question because if there’s one thing I’m proud of with respect to this blog, it is the weird and wacky use cases that I have documented here.

Some things I’ve talked about include:

In this post I would like to add to the list above and discuss another lesser known use case for computer vision: heart rate estimation from colour cameras.

Vital Signal Estimation

Heart rate estimation belongs in the field called “Vital Signal Estimation” (VSE). In computer vision, VSE has been around for a while. One of the more famous attempts at this comes from 2012 from a paper entitled “Eulerian Video Magnification for Revealing Subtle Changes in the World” that was published at SIGGRAPH by MIT.

(Note: as I’ve mentioned in the past, SIGGRAPH, which stands for “Special Interest Group on Computer GRAPHics and Interactive Techniques”, is a world-renowned annual conference held for computer graphics researchers. But you do sometimes get papers from the world of computer vision being published there as is the case with this one.)

Basically, the way that VSE was implemented by MIT was that they analysed images captured from a camera for small illumination changes on a person’s face produced by varying amounts of blood flow to it. These changes were then magnified to make it easier to scrutinise. See, for example, this image from their paper:

Amazing that these illumination changes can be extracted like this, isn’t it?

This video describes the Eulerian Video Magnification technique developed by these researchers (colour amplification begins at 1:25):

Interestingly, most research in VSE has focused around this idea of magnifying minute changes to estimate heart rates.

Uses of Heart Rate Estimation

What could heart rate estimation by computer vision be used for? Well, medical scenarios automatically come to mind because of the non-invasive (i.e. non-contact) feature of this technique. The video above (from 3:30) suggests using this technology for SIDS detection. Stress detection is also another use case. And what follows from this is lie detection. I’ve already written about lie detection using thermal imaging – here is one more way for us to be monitored unknowingly.

On the topic of being monitored, this paper suggests (using a slightly different technique of magnifying minute changes in blood flow to the face) detecting emotional reactions to TV programs and advertisements.

Ugh! It’s just what we need, isn’t it? More ways of being watched.

Making it Proprietary

From what I can see, VSE in computer vision is still mostly in the research domain. However, this research from Utah State University has recently been patented and a company called Photorithm Inc. formed around it. The company produces baby monitoring systems to detect abnormal breathing in sleeping infants. In fact, Forbes wrote an interesting article about this company this year. Among other things, the article talks about the reasons behind the push by the authors for this research and how the technology behind the application works. It’s a good read.

Here’s a video demonstrating how Photorithm’s product works:

Summary

This post talked about another lesser known use case of computer vision: heart rate estimation. MIT’s famous research from 2012 was briefly presented. This research used a technique of magnifying small changes in an input video for subsequent analysis. Magnifying small changes like this is how most VSE technologies in computer vision work today.

After this, a discussion of what heart rate estimation by computer vision could be used for followed. Finally, it was mentioned that VSE is still predominantly something in the research domain, although one company has recently appeared on the scene that sells baby monitoring systems to detect abnormal breathing in sleeping infants. The product being sold by this company uses computer vision techniques presented in this post.

To be informed when new content like this is posted, subscribe to the mailing list (or subscribe to my YouTube channel!):

Image Steganography – Simple Examples

Posted on November 21, 2018January 16, 2025 by Zbigatron

In my last post I introduced the field of image steganography, which is the practice of concealing secret messages in digital images. I looked at the history of steganography and presented some recently reported real-life cases (including one from the FBI) where digital steganography was used for malicious purposes.

In this post I would like to present to you the following two very simple ways messages can be hidden in digital images:

JPEG Concealing
Least Significant Bit Technique

These techniques, although trivial and easy to detect, will give you an idea of how simple (and therefore potentially dangerous) digital image steganography can be.

JPEG Concealing

Image files in general are composed of two sections: header data + image data. The header data section can contain metadata information pertaining to the image such as date of creation, author, image resolution, and compression algorithm used if the image is compressed. This is the standard for JPEGs, BMPs, TIFFs, GIFs, etc.

Knowing this, one can work around these file structures to conceal messages.

Let’s take JPEGs as an example. The file structure for this format is as follows:

Notice that every single JPEG file starts and ends with the SOI and EOI markers, respectively.

What this means is that any image interpreting application (e.g. Photoshop or GIMP, any internet browser, the standard photo viewing software that comes with your operating system, etc.) looks for these markers inside the file and knows that it should interpret and display whatever comes between them. Everything else is automatically ignored.

Hence, you can insert absolutely anything after the EOI marker like this:

And even though the hidden message will be part of the JPEG file and travel with this file wherever it goes, no standard application will see anything out of the ordinary. It will just read whatever comes before EOI.

Of course, if you put a lot of data after EOI, your file size will increase significantly and might, therefore, arouse suspicion – so you have to be wary of that. In this case, it might be an idea to use a high resolution JPEG file (that naturally has a large file size) to turn attention away from your hidden message.

If you would like to try this steganography technique out yourself, download a hex editor for your machine (if you use Windows, WinHex is a good program), search for FF D9 (which is the hex version of EOI), paste anything you want after this section marker, and save your changes. You will notice that the file is opened like any other JPEG file. The hidden message simply piggy backs on top of the image file. Quite neat!

(Note: hexadecimal is a number system made up of 16 symbols. Our decimal system uses 10 digits: 0-9. The hex system uses the 10 digits from the decimal system plus the first 6 letters of the alphabet. To cut a long story short, hexadecimal is a shorthand and therefore much easier way to read/write binary digits, i.e. 1s and 0s. Most file formats will not save data in human readable form and we therefore need help if we want to view the raw data of these files – this is why hex is used sometimes used)

The Least Significant Bit Technique

Although easy to detect (if you know what you’re looking for), the Least Significant Bit (LSB) technique is a very sly way of hiding data in images. The way that it works is by taking advantage of the fact that small changes in pixel colour are invisible to the naked eye.

Let’s say we’re encoding images in the RGB colour space – i.e. each pixel’s colour is represented by a combination of a certain amount of red (R), a certain amount of green (G), and a certain amount of blue (B). The amount of red, green, and blue is given in the the range of 0 to 255. So, pure red would be represented as (255, 0, 0) in this colour space – i.e. the maximum amount of red, no green, and no blue.

Now, in this scenario (and abstracting over a few things), a machine would represent each pixel in 3 bytes – one byte for each of red, green and blue. Since a byte is 8 bits (i.e. 8 ones and zeros) each colour would be stored as something like this:

That would be the colour red (11111111 in binary is 255 in our number system).

What about if we were to change the 255 into 254 – i.e. change 11111111 into 11111110? Would we notice the difference in the colour red? Absolutely not. How about changing 11111111 to 11111100 (255 to 252)? We still would not notice the difference – especially if this change is happening to single pixels!

The idea behind the LSB technique, then, is to use this fact that slightly changing the colour of each pixel would be imperceptible to the naked eye.

Since the last few digits in a byte are insignificant this is where LSB gets its name: the Least Significant Bit technique.

We know, then, that the last few bits in each byte can be manipulated. So, we can use this knowledge to set aside these bits of each pixel to store a hidden message.

Let’s look at an example. Suppose we want to hide a message like “SOS“. We choose to use the ASCII format to encode our letters. In this format each character has its own binary representation. The binary for our message would be:

What we do now is split each character into two-bit pairs (e.g. S has the following four pairs: 01, 01, 00, 11) and spread these pairs successively along multiple pixels. So, if our image had four pixels, our message would be encoded like this:

Notice that each letter is spread across two pixels: one pixel encodes the first 3 pairs and the next pixel takes the last pair. Very neat, isn’t it? You can choose to use more than 2 bits per pixel to store your message but remember that by using more bits you risk changes to each pixel becoming perceptible.

Also, the larger the image, the more you can encode. And, since images can be represented in binary, you can store an image inside an image using the exact same technique. In this respect, I would highly recommend that you take a look at this little website that will allow you to do just that using the method described here.

I would also recommend going back to the first post of this series and looking at the image-inside-image steganography examples there provided by the FBI. It shows brilliantly how sneaky image steganography can be.

Summary

In this post I looked at two simple techniques of image steganography. The first technique takes advantage of the fact that image files have an end-of-file (EOF) marker in their metadata. This means that any program opening these images will read everything up to and including this marker. If you were to put anything after the EOF, it would be hidden from view. The second technique takes advantage of the fact that slightly changing the colour of a pixel is imperceptible to the naked eye. In this sense, the least significant bits of each pixel can be used to spread a message (e.g. text or image) across the pixels in an image. A program would then be used to extract this message.

To be informed when new content like this is posted, subscribe to the mailing list (or subscribe to my YouTube channel!):

Image Steganography – An Introduction

Posted on October 5, 2018January 16, 2025 by Zbigatron

In this post (part 1 of 2 – part 2 can now be found here) I would like to introduce the topic of image steganography, which is the practice of concealing secret messages in digital images. I’ve always been fascinated by this subject so I have taken the excuse to research for this post as a way to delve into the topic. Turns out that image steganography is a fascinating field that should be garnering much more attention than it is.

I will divide the post into two sections:

Steganography: what it is and its early history
Digital image steganography and some recently reported real-life cases – including one from an FBI report on Russian spying in the US (like something out of the cold war)

In my next post I will detail some simple techniques of hiding messages in images, so stay tuned for that.

What is Steganography

Usually today if we want to send sensitive data (e.g. credit card information), we encrypt this data before sending it across the internet. Sending messages like this, however, can arouse suspicion: there is obviously sensitive/secret data in your encrypted message that you are trying to conceal. Attackers know exactly where to look to try to obtain this information.

But steganography works differently: you hide the message in plain sight in order for your message to not attract any attention at all.

The first recorded case of steganography goes back to 499 BC when the Greek tyrant Histiaeus shaved the head of his slave and “marked” (probably tattooed) a secret message onto it. The message was intended for Aristagoras and it was telling him to start a revolt against the Persians. Histiaeus waited for the slave’s hair to grow back before sending him on his way. When the slave reached Aristagoras, his head was shaved again to reveal the hidden message.

Who would have thought to stop the slave and look for a hidden message tattooed on his head? Ingenious, wasn’t it? (Well, maybe not for the slave who was probably left with that message permanently on his head…).

That’s the way steganography works: through deception.

It is an important topic because of how seemingly common it is becoming. A report from 2017 by the global computer security software company McAfee says that steganography is being used in more ways today than ever before. However, Simon Wiseman, the chief technology officer of the network security firm Deep Secure, argues that it’s not so much that steganography is becoming more popular, just that we are discovering it more often by learning how it is being done: “now that people are waking up to the fact that it’s out there, the discovery rate is going up.”

Either way, as McAfee claims: “Steganography will continue to become more popular.”

Digital Image Steganography

As mentioned earlier, digital image steganography is the hiding of secret messages inside images. Take a look at these two images distributed by the FBI:

You wouldn’t think that both of them contain the following map of an airport, would you?

Well, they do. The FBI doesn’t lie 🙂

It’s a scary thing when you consider the huge number of images being sent across the internet every day. You would really have to know precisely where to scan for this stuff and what to look for otherwise you’re searching for a needle in a haystack.

Now, the first recorded case of image steganography in a cyberattack dates back to 2011. It was called the Duqu malware attack and it worked by encrypting and embedding data into small JPEG image files. These files were then sent to servers to obtain sensitive information (rather than doing destructive work directly like deleting files). McAfee says that it was used to, for example, steal digital certificates from its victims. How Duqu worked exactly, however, remains unknown. Researchers are still trying to work this out (although all sources I could find on this are fairly outdated). Quite amazing.

I found earlier reported cases, however, of image steganography being used for malicious purposes, not necessarily in cyberattacks. My favourite one is from the FBI.

Here’s an official report from them from 2010 accusing the Russian foreign intelligence agency of embedding encrypted text messages inside image files for communications with agents stationed abroad. This reportedly all took place in the 90s in the US. Turns out that the 10 spies mentioned in the report later pleaded guilty to being Russian agents and were used as part of a spy swap between the U.S. and Russian governments. The FBI and the Russians… and spy swapping! Like something out of a movie. Shows you how serious the topic of digital image steganography is.

You can see how this way of embedding communication in images is a much more sophisticated version of the “tattooing a message on a shaved head” example from Ancient Greece described above.

Is image steganography being used like this by ISIS to communicate secretly amongst each other? Chances are it is.

Early this year a communication tool was discovered called MuslimCrypt (poor choice of name, in my opinion). As Wired reports, the tool was found in a private, pro-ISIS Telegram channel on January 20. It is dead simple to use (take a look at the video on the Wired page to see this for yourselves): you select an image, write a message in text form, select a password, and click one button to hide this message inside the image. This image can then be sent across the internet after which the recipient puts it into MuslimCrypt and with one click of a button retrieves the hidden message. Sneaky, dangerous stuff.

What would make detection even more difficult is a hidden message distributed over multiple images. Well, models for this already exist, as the academic paper “Distributed Steganography” (Liao et al., International Conference on Intelligent Information Hiding and Multimedia Signal Processing, 2011) presents.

Moreover, a patent for distributed steganography was filed by a certain Charles Easttom II William in 2010. This image from the patent summarises distributed steganography nicely:

distributed-steganography-example — *(Image adapted from the original patent)*

Fascinating stuff, isn’t it?

Stay tuned for my next post where I will look in detail at some simple examples of digital image steganography. (Update: this new post can now be found here)

To be informed when new content like this is posted, subscribe to the mailing list (or subscribe to my YouTube channel!):

Be an Optimist Prime in the world of Computer Vision and AI

Author: Zbigatron

Capturing the Moment in Photography – SIGGRAPH 2019 Award

Smart Glasses for the Blind – Why has it Taken This Long?

Delivery Drones and the Google Wing Project

The Benefits of Delivery Drones

Benefits for local businesses:

Benefits for consumers:

Benefits for society:

The Potential Drawbacks of Delivery Drones

Google Wing Drones

Summary

My Top 5 Posts So Far

My top 5 posts thus far:

My favourite posts thus far:

Smartphone Camera Technology from Google and Nokia

High Dynamic Range Imaging

The Nokia 9 PureView

Google’s HDR+

Google’s Dataset Search Engine – Find The Dataset You Need Here

Seeing Around Corners with a Laser

LIDAR

NLOS Imaging

Current Research into NLOS

Summary

Heart Rate Estimation Using Computer Vision

Vital Signal Estimation

Uses of Heart Rate Estimation

Making it Proprietary

Summary

Image Steganography – Simple Examples

JPEG Concealing

The Least Significant Bit Technique

Summary

Image Steganography – An Introduction

What is Steganography

Digital Image Steganography