How Deep Learning Works – The Very Basics

Deep learning (DL) revolutionised computer vision (CV) and artificial intelligence in general. It was a huge breakthrough (circa 2012) that allowed AI to blast into the headlines and into our lives like never before. ChatGPT, DALL-E 2, autonomous cars, etc. – deep learning is the engine driving these stories. DL is so good, that it has reached a point where every solution to a problem involving AI is now most probably being solved using it. Just take a look at any academic conference/workshop and scan through the presented publications. All of them, no matter who, what, where or when, present their solutions with DL.

The solutions that DL is solving are complex. Hence, necessarily, DL is a complex topic. It’s not easy to come to grips with what is happening under the hood of these applications. Trust me, there’s heavy statistics and mathematics being utilised that we take for granted.

In this post I thought I’d try to explain how DL works. I want this to be a “Deep Learning for Dummies” kind of article. I’m going to assume that you have a high school background in mathematics and nothing more. (So, if you’re a seasoned computer scientist, this post is not for you – next time!)

Let’s start with a simple equation:

What are the values of x and y? Well, going back to high school mathematics, you would know that x and y can take an infinite number of values. To get one specific solution for x and y together we need more information. So, let’s add some more information to our first equation by providing another one:

Ah! Now we’re talking. A quick subtraction here, a little substitution there, and we will get the following solution:

Solved!

More information (more data) gives us more understanding. 

Now, let’s rewrite the first equation a little to provide an oversimplified definition of a car. We can think of it as a definition we can use to look for cars in images:

We’re stuck with the same dilemma, aren’t we? One possible solution is this:

But there are many, many others.

In fairness, however, that equation is much too simple for reality. Cars are complicated objects. How many variables should a definition have to visually describe a car, then? One would need to take colour, shape, orientation of the car, makes, brands, etc. into consideration. On top of that we have different weather scenarios to keep in mind (e.g. a car will look different in an image when it’s raining compared to when it’s sunny – everything looks different in inclement weather!). And then there’s also lighting conditions to consider too. Cars look different at night then in the daytime.

We’re talking about millions and millions of variables! That is what is needed to accurately define a car for a machine to use. So, we would need something like this, where the number of variables would go on and on and on, ad nauseam:

This is what a neural network sets up. Exactly equations like this with millions and millions and sometimes billions or trillions of variables. Here’s a picture of a small neural network (inicidentally, these networks are called neural networks because they’re inspired by how neurons are interconnected in our brains):

Image adapted from here

Each of the circles in the image is a neuron. Neurons are interconnected and arranged in layers, as can be seen above. Each neuron connection (the black lines above) has a weight associated with it. When a signal passes from one neuron to the next via a connection, the weight specifies how strong the original signal is going to be before it reaches the end of the connection. A weight can be thought of as a single variable – except that in technical terms, these variables are called “parameters“, which is what I’m going to call them from now on in this post.

The network above has a few hundred parameters (basically, the number of connections). To use our example of the car from earlier, that’s not going to be enough for us to adequately define a car. We need more parameters. Reality is much too complex for us to handle with just a handful of unknowns. Hence why some of the latest image recognition DL networks have parameter numbers in the billions. That means layers, and layers, and layers of neurons as well as all their connections.

(Note: a parameter count of a neural network will also include what’s called “biases” but I’ll leave that out in this post to keep things simple)

Now, initially when a neural network is set up with all these parameters, these parameters (variables) are “empty”, i.e. they have not been initiated to anything meaningful. The neural network is unusable – it is “blank”.

In other words, with our equation from earlier, we have to work out what each x, y, z, … is in the definitions we wish to solve for.

To do this, we need more information, don’t we? Just like in the very first example of this post. We don’t know what x, y, and z (and so on) are unless we get more data.

This is where the idea of “training a neural network” or “training a model” comes in. We throw images of cars at the neural network and get it to work out for itself what all the unknowns are in the equations we have set up. Because there are so many parameters, we need lots and lots and lots of information/data – cf. big data.

big-data

And so we get the whole notion of why data is worth so much nowadays. DL has given us the ability to process large amounts of data (with tonnes of parameters), to make sense of it, to make predictions from it, to gain new insight from it, to make insightful decisions from it. Prior to the big data revolution, nobody collected so much data because we didn’t know what to do with it. Now we do.

One more thing to add to all this: the more parameters in a neural network, the more complex equations/tasks it can solve. It makes sense, doesn’t it? This is why AI is getting better and better. People are building larger and larger networks (GPT-4 is reported to have parameters in the trillions, GPT-3 has 175 billion, GPT-2 has 1.5 billion) and training them on swathes of data. The problem is that there’s a limit to just how big we can go (as I discuss in this post and then this one) but this is a discussion for another time.

To conclude, this ladies and gentlemen are the very basics of Deep Learning and why it has been such a disruptive technology. We are able to set up these equations with millions/billions/trillions of parameters and get machines to work out what each of these parameters should be set to. We define what we wish to solve for (e.g. cars in images) and the machine works the rest out for us as long as we provide it with enough data. And so AI is able to solve more and more complex problems in our world and do mind-blowing things.

(Note: If this post is found on a site other than zbigatron.com, a bot has stolen it – it’s been happening a lot lately)

To be informed when new content like this is posted, subscribe to the mailing list:

Amazon Go and Just Walk Out Tech in 2023

It has been over five years since Amazon opened its cashierless stores to the public. I reported on this event on my blog with great enthusiasm:

I am, to put it simply, thrilled and excited about this new venture. This is innovation at its finest where computer vision is playing a central role.

Me in 2018

I thought I would return to this initiative of Amazon’s in my latest post to see how things are going and whether what I got excited about has made an impact around the world and also whether there is hope for it in the future. Because cashierless stores running on a Computer Vision engine is still something to get excited about – in my opinion, anyway.

(For a summary of what Amazon Go is, how it works, and some early controversies behind it, see my original post on this topic. In a nutshell, though, Amazon Go is a cashierless convenience store: you use an Amazon app to scan on entry, pick what you want from the shelves, and just walk out when you’re done. Then, within a very short time, money is deducted from your account and a receipt is sent to your app).

Amazon Go in 2023

When Amazon opened its first store to the public in early 2018 it was proudly exclaiming that its hope was to open at least 3,000 of them in the near future.

That has not panned out as planned.

Currently there are 23 cashierless stores open in the USA and 20 in the United Kingdom (called Amazon Fresh there). That’s a total of 43 convenience stores, which isn’t bad but it’s well short of the 3,000 target set 7 years ago.

Moreover, two months ago, Amazon announced that it was closing eight of its stores based in the US, including all four in San Francisco. Hence, the total number of Amazon Go shops will soon drop to 35.

Perhaps this initiative wasn’t worth the investment and effort since targets have obviously not been met?

Let’s have a look now at the technology used behind it and how that has fared in the world.

“Just Walk Out” Technology

In 2020, two years after opening its first Go store, Amazon announced that the technology behind their cashierless stores called “Just Walk Out” will be available for sale/rent to other retailers – meaning that other companies will be able to also implement the “walk in, pick out what you want, and just walk out” idea in their stores.

How has this idea fared? Also, not as planned. Amazon bet big on this idea but the uptake has been poor.

I mean, it sounds like a great idea, doesn’t it? A seamless store experience where you can quickly get what you want and be back in your car within a few minutes. Big retailers out there with endless supplies of money would undoubtedly consider this for their stores if the technology was financially viable.

This seems to be the crux of the problem, however. A few dozen small retail shops have installed the technology (including a few Starbucks) but nothing to create worldwide headlines. Apparently, the technology is expensive to install (shops need to be littered with cameras and various other sensors) and then calibrated and configured. The whole process requires significant downtime for a store not to mention expert personnel to keep the thing running once it’s installed.

More Controversies

When the first Amazon Go store opened, controversy surrounded it, as I reported in my first post on this topic. Unfortunately, new controversies surrounding this whole business enterprise have also since emerged.

In 2019 San Francisco, New Jersey, and Philadelphia banned cashless stores in their areas. The idea was that these stores were allegedly discriminating against low-income people who may not have access to mobile phones or even bank accounts. Hence, Amazon Go stores were inaccessible to them. This sounds fair but that was a big blow for Amazon and in response they had to open stores that also accepted cash as payment.

A checkout-less store that is forced to accept cash as payment kind of goes against the “Just Walk Out” philosophy around which the whole product is built. These stores are no longer cashierless, so to speak.

Perhaps this is why Amazon decided this year to shut all four of their stores in San Francisco?

Conclusion

A convenience store that you can simply walk in and out of at your leisure is impressive, to say the least. It feels like something from the future, doesn’t it? Despite the slow uptake and the controversies, I’m still excited by this whole venture. And it’s not as if the project has been a complete flop either. Amazon Go is expanding – a lot slower than it anticipated 5 years ago but it seems to be making a profit, nonetheless.

I hope that there is more in store (pun intended) for Computer Vision projects like this.

(Note: If this post is found on a site other than zbigatron.com, a bot has stolen it – it’s been happening a lot lately)

To be informed when new content like this is posted, subscribe to the mailing list:

The Internet Is Not What You Think It Is – Review

Score: 1 star out of 5.

It’s hard to not write a fascinating book on the Philosophy of the Internet. The Internet is a recent phenomenon that has pervaded all aspects of our lives at lightning speed. And just like social and political policies, philosophy is finding it hard to keep up. We just haven’t had the time to step back and process how what is happening around us could be understood at a metaphysical or phenomenological level. This is the current wild, wild west of the intellectual world. So, if you’re a smart, observant, intuitive cookie, a book on the Philosophy of the Internet is a given to be a hit. You’re going to be a trailblazer.

Unfortunately, Justin E. H. Smith, professor of history and philosophy of science at the Université Paris Cité, fails at this task miserably. I’m still baffled at how he managed it. Especially when he started off so well: “We are living in a crisis moment of history, in the true sense of “crisis”: things might get better eventually, but they will never be the same… The principle charges against the internet… have to do with the ways in which it has limited our potential and our capacity for thriving, the ways in which it has distorted our nature and fettered us… as such the internet is anti-human. If we could put it on trial, its crime would be a crime against humanity.”

Well, the suspense has been built! This simply has to be a great read! One would think…

Professor Smith, however, is one of those professors you may have had the amusement as well as annoyance to have come across in your university days as a student. You see a fascinating topic. Great first slide. Great introduction. After 10 minutes, though, you start to shuffle in your chair awkwardly wondering if you’re the only one in the room questioning the validity of what is being presented to you before your eyes. After 20 minutes you start to look around the room to see if others are starting to feel any annoyance at all. After 30 minutes, you have your face in your hands wondering how on earth this person managed to get a high position at a university.

Smith tries to build a philosophy of the Internet but he does it poorly. In a nutshell, he thinks that to create a Philosophy of the Internet he just has to show that the phenomena that we experience with the internet, such as communication and interconnectedness, have existed in one way or another since the dawn of time:

“[The Internet] does not represent a radical rupture with everything that came before, either in human history or in the vastly longer history of nature that precedes the first appearance of our species… [it is] more like an outgrowth latent from the beginning in what we have always done.”

And this is how he proceeds for the rest of the book:

“[T]he sperm whale’s clicks, the elephant’s vibrations, the lima beans plant’s rhizobacterial emissions… are all varieties of “wifi” too.”

Pages and pages of analogies from nature follow:

“It was just as common from antiquity through the modern period to envision nature… as a wired or connected network, that is, a proper web… Such a system is instanced paradigmatically in what may be thought of as the original web, the one woven by the spider”

From whales’ “clicks” to spiders’ webs in nature we’re meant to build a Philosophy of the Internet? What on earth are you talking about, here, sir?

I’d stop and move on but these quotes are just too good to pass up:

“The important thing to register for now is that the spider’s web is a web in at least some of the same respects that the World Wide Web is a web: it facilitates reports, to a cognizing or sentient being that occupies one of its nodes, about what is going on at other of it nodes”.

The “vegetal world” gets a mention, too, of course. Field grass, trees – all these have “underground network of roots, whose exchanges can be tracked to a technique known as “quantum dot tagging””.

We’re about one-third of the way through the book now and this is about the moment that I’m starting to look around the lecture room to see if anybody else is noticing these fickle attempts at intellectualism. This is something worthy of a high school philosophy paper.

From analogies in nature, Smith then proceeds to analogies in the history of thought:
“In the history of western philosophy, in fact, one of the most enduring ways of conceiving the connectedness of all beings… has been through the idea of a “world soul”… One might dare to say, and I am in fact saying, that we always knew the internet was possible. Its appearance in the most recent era is only the latest twist in a much longer history of reflection on the connectedness and unity of all things.”

Absolute gold. The best quote out of these sections is this one:
“The very development of the binary calculus that… marks the true beginning of the history of information science, was itself a direct borrowing from a broadly neo-Platonic mystical tradition of contemplating the relationship between being and non-being: where the former might be represented by “1” and the latter by “0.”

The fact that we have 1s and 0s in electronics can be traced back to neo-Platonic mystical traditions? This guy has got to be joking. We’re two-thirds into this book and now I’m not only wondering if anybody else sees through this junk in this lecture theatre but I’m also starting to wonder whether I’m not in one of Franz Kafka’s novels. This guy is a professor at a prestigious university in Paris. It is common knowledge that Kafka was known to laugh uncontrollably when reading his work aloud to friends. By this stage I’m laughing aloud in a cafe myself at what I’m reading.

When Professor Smith finally finishes showing how ideas inherent in the internet originate in lima beans and Augustinian “Confessions”, he ends abruptly and with satisfaction. Not much more is given to round out his treatise. The idea of showing that the Internet “is more like an outgrowth latent from the beginning” and hence not as radical as we may think isn’t given any force. There’s not much there to reflect on – it’s nothing groundbreaking.

Yes, I’m still looking around the lecture theatre to discern whether I’m in Kafka’s Trial or not. People around me are clapping their gratitude. I have no idea what is happening. When the clapping subsides, Smith adds one more utterance to his work. And then everything becomes crystal clear to me:

“I am writing, from New York City, during the coronavirus quarantine in the spring of the year 2020.”

Ah! There you have it! A work conceived during a lockdown period. Now this book makes perfect sense to me!

We’ve all been there, haven’t we? We all went a bit crazy and insane when we were sent to our rooms by our benevolent government during the pandemic during which time we all conceived of nutty philosophical ideas that were supposed to save the world. The difference is that when we finally left our confines and lucidity hit us like a fast-moving bus, we retracted our incoherent ideas. Smith, unfortunately, did not do this.


To be informed when new content like this is posted, subscribe to the mailing list:

emoji-scary-face

The West Fears AI, the East Does Not

We were recently handed an open letter where it was pleaded that we pause giant AI experiments and in the meantime “ask ourselves…Should we develop nonhuman minds that might eventually outnumber, outsmart, obsolete and replace us? Should we risk loss of control of our civilization?

Prominent names in computer science, such as Elon Musk and Steve Wozniak are signatories to this letter and as a result it made headlines all over the world with the usual hype and pomp surrounding anything even remotely pertaining to AI.

Time magazine, for instance, posted this in an article only last month:

I refrained from signing because I think the letter is understating the seriousness of the situation and asking for too little to solve it… Many researchers steeped in these issues, including myself, expect that the most likely result… is that literally everyone on Earth will die.

Quote taken from this article.

We’re used to end-of-the-world talk like this, though, aren’t we? Prof Stephen Hawking in 2014 warned that “The development of full artificial intelligence could spell the end of the human race.” And of course we have Elon Musk who is at the forefront of this kind of banter. For example in 2020 he said: “We’re headed toward a situation where AI is vastly smarter than humans and I think that time frame is less than five years from now.”

The talk on the streets amongst everyday folk seems to be similar, too. How can it not be when the media is bombarding us with doom and gloom (because sensationalism is what sells papers, as I’ve said in previous posts of mine) and authority figures like those mentioned above are talking like this.

Is society scared of AI? I seem to be noticing this more and more. Other very prominent figures are trying to talk common sense to bring down the hype and have even publicly opposed the open letter from last month. Titans of AI like Yann LeCunn and Andrew Ng (who are 1,000 times greater AI experts than Elon Musk, btw) have said that they “disagree with [the letter’s] premise” and a 6-month pause “would actually cause significant harm“. Voices like this are not being heard, however.

But then the other day while I was reading through the annual AI Index Report released by Stanford Institute for Human-Centered Artificial Intelligence (HAI) (over 300 pages of analysis capturing trends in AI) this particular graph stood out for me:

Global Opinions on AI. Respondents agreeing that “products and services using AI have more benefits than drawbacks”. Graph adapted from here.

What struck me was how Asia and South America (Japan being the sole exception) want to embrace AI and are generally fans of it. Europe and the US, on the other hand, not so much.

This got me thinking: is this fear of AI only dominant in Europe and the US and if so, is it a cultural thing?

Now, off the bat, the reasons for Asia and South America embracing AI could be numerous and not necessarily cultural. For example, these countries are lower income countries and perhaps they see AI as being a quick solution to a better life in the present. Fair enough.

Also, the reasons behind Europe and the US eschewing AI could be purely economic and short-term as well: they fear the imminent disruption in jobs that can follow upon developments in technology rather than directly fearing an AI apocalypse.

In spite of all this and understanding that correlation does not necessarily entail causation, perhaps there’s something cultural to all of this, after all. The significant signatories to the recent open letter seem to have come purely from the US and Europe.

I had two teaching stints in India last year and one in the Philippines. One of the topics I lectured was AI and as part of a discussion exercise I got my students to debate with me on this very topic, i.e. whether we are capable at all in the near or distant future of creating something that will outsmart and then annihilate us. The impression that I got was that the students in these countries had a much deeper appreciation for the uniqueness of human beings as compared to machines. There was something intrinsically different in the way that they referred to AI as compared to the people in my home country of Australia and second home of Europe with whom I talk to on a daily basis.

Of course, these are just my private observations and a general “feeling” that I got while working in those two countries. The population size of the experiment would be something like 600 and even then it was not possible for me to get everybody’s opinion on the matter let alone request all my classes to complete a detailed survey.

Regardless, I think I’m raising an interesting question.

Could the West’s post-Descartes and post-Enlightenment periods have created in us a more intrinsic feeling that rationality and consciousness are things that are easily manipulated and simulated and then ultimately enhanced? Prior to the Enlightenment, man was whole (that is, consciousness was not a distinct element of his existence) and any form of imitation of his rationality would have been regarded as always being inferior regardless of how excellent the imitation could have been.

The Turing test would not have been a thing back then. Who cares if somebody is fooled by a machine for 15 minutes? Ultimately it is still a machine and something inherently made of just dead matter that could never transcend into the realm of understanding, especially that of abstract reality. It could mimic such understanding but never possess it. Big difference.

Nobody would have been scared of AI back then.

Then came along Descartes and the Enlightenment period. Some fantastic work was done during this time, don’t get me wrong, but we as humans were transformed into dead, deterministic automata as well. So, it’s no wonder we believe that AI can supersede us and we are afraid of it.

The East didn’t undergo such a period. They share a different history with different philosophies and different perceptions of life and people in general. I’m no expert on Eastern Philosophies (my Master’s in Philosophy was done purely in Western Thought) but I would love for somebody to write a book on this topic: How the East perceives AI and machines.

And then perhaps we could learn something from them to give back the dignity to mankind that it deserves and possesses. Because we are not just deterministic machines and the end of civilisation is not looming over us.

Parting Words

I am not denying here that AI is not going to improve or be disruptive. It’s a given that it will. And if a pause is needed it is for one reason: to ensure that the disruption isn’t too overwhelming for us. In fairness, the Open Letter of last month does state something akin to this, i.e. “Powerful AI systems should be developed only once we are confident that their effects will be positive and their risks will be manageable.” The general vibe of the letter, nonetheless, is one of doom, gloom, and oblivion, and this is what I’ve wanted to address in my article.

Secondly, I realise that I’ve used the East/West divide a little bit erroneously because South America is commonly counted as a Western region. However, I think it’s safe to say that Europe and the US are traditionally much closer culturally to each other than they are respectively with South America. The US has a strong Latino community but the Europe-US cultural connection is a stronger one. To be more precise I would like to have entitled my article “Europe and the USA Fear Artificial Intelligence, Asia and South America Do Not” but that’s just a clunky title for a little post on my humble blog.

Finally, I’ll emphasise again that my analysis is not watertight. Perhaps, in fact, I’m clutching at straws here. However, maybe there just is something to my question that the way the “East” perceives AI is different and that we should be listening to their side of the story more in this debate on the future of AI research than we currently are.

To be informed when new content like this is posted, subscribe to the mailing list:

scale-cv-dl

Deep Learning Eliminated Creativity in AI

Deep learning (DL) revolutionised computer vision (CV) and artificial intelligence in general. It was a huge breakthrough (circa 2012) that allowed AI to blast into the headlines and into our lives like never before. ChatGPT, DALL-E 2, autonomous cars, etc. – deep learning is the engine driving these stories. DL is so good, that it has reached a point where every solution to a problem involving AI is now most probably being solved using it. Just take a look at any academic conference/workshop and scan through the presented publications. All of them, no matter who, what, where or when, present their solutions with DL.

Now, DL is great, don’t get me wrong. I’m lapping up all the achievements we’ve been witnessing. What a time to be alive! Moreover, deep learning is responsible for placing CV on the map in the industry, as I’ve discussed in previous posts of mine. CV is now a profitable and useful enterprise, so I really have nothing to complain about. (CV used to just be a predominantly theoretical field found usually only in academia due to the inherent difficulty of processing videos and images.)

Nonetheless, I do have one little qualm with what is happening around us. With the ubiquity of DL, I feel as though creativity in AI has been killed.

To explain what I mean, I’ll discuss first how DL changed the way we do things. I’ll stick to examples in computer vision to make things easier, but you can easily transpose my opinions/examples to other fields of AI.

Traditional Computer Vision

Before the emergence of DL if you had a task such as object classification/detection in images (where you try to write an algorithm to detect what objects are in an image), you would sit down and work out what features define each and every particular object that you wished to detect. What are the salient features that define a chair, a bike, a car, etc.? Bikes have two wheels, a handlebar and pedals. Great! Let’s put that into our code: “Machine, look for clusters of pixels that match this definition of a bike wheel, pedal, etc. If you find enough of these features, we have a bicycle in our photo!”

So, I would take a photo of my bike leaning against my white wall and I then feed it to my algorithm. At each iteration of my experiments I would work away by manually fine tuning my “bike definition” in my code to get my algorithm to detect that particular bike in my photo: “Machine, actually this is a better definition of a pedal. Try this one out now.”

Once I would start to see things working, I’d take a few more pictures of my bike at different angles and repeat the process on these images until I would get my algorithm to work reasonably well on these too.

Then it would be time to ship the algorithm to clients.

Bad idea! It turns out that a simple task like this becomes impossible to do because a bike in a real-world picture has an infinite number of variations. They come in different shapes, sizes, colours and then on top of that you have to add the different variations that occur with lighting and weather changes and occlusions from other objects. Not to mention the infinite number of angles into which you can position a bike. All these permutations are too much to handle for us mere humans: “Machine, actually I simply can’t give you all the possible definitions in terms of clusters of pixels of a bike wheel because there are too many parameters for me to deal with manually. Sorry.”

Incidentally, there’s a famous xkcd cartoon that captures the problem nicely:

xkcd-computer-vision
(image taken from here)

Creativity in Traditional Computer Vision

Now, I’ve simplified the above process greatly and abstracted over a lot of things. But the basic gist is there: the real world was hard for AI to work in and to create workable solutions you were forced to be creative. Creativity on the part of engineers and researchers revolved around getting to understand the problem exceptionally well and then turning towards an innovative and visionary mind to find a perfect solution.

Algorithms abounded to assist us. For example, one would commonly employ things like edge detectioncorner detection, and colour segmentation to simplify images to assist us with locating our objects, for example. The image below shows you how an edge detector works to “break down” an image:

edge-detection-example
(image example taken from here)

Colour segmentation works by changing all shades of dominant colours in an image into one shade only, like so:

colour-thresholding-example
(image example taken from here)

The second image is much easier to deal with. If you had to write an algorithm for a robot to find the ball, you would now ask the algorithm to look for patches of pixels of only ONE particular shade of orange. You would no longer need to worry about changes in lighting and shading that would affect the colour of the ball (like in the left image) because everything would be uniform. That is, all pixels that you would deal with would be one single colour. And suddenly your definitions of objects that you were trying to locate were not as dense. The number of parameters needed dropped significantly.

Machine learning would also be employed. Algorithms like SVM, k-means clustering, random decision forests, Naive Bayes were there at our disposal. You would have to think about which of these would best suit your use-case and how best to optimise them.

And then there were also feature detectors – algorithms that attempted to detect salient features for you to help you in the process of creating your own definitions of objects. The SIFT and SURF algorithms deserve Oscars for what they did in this respect back in the day.

Probably, my favourite algorithm of all time is the Viola-Jones Face Detection algorithm. It is ingenious in its simplicity and for the first time allowed face detection (and not only) to be performed in real-time in 2001. It was a big breakthrough in those days. You could use this algorithm to detect where faces were in an image and then focus your analysis on that particular area for facial recognition tasks. Problem simplified!

Anyway, all the algorithms were there to assist us in our tasks. When things worked, it was like watching a symphony playing in harmony. This algorithm coupled with this algorithm using this machine learning technique that was then fed through this particular task, etc. It was beautiful. I would go as far as to say that at times it was art.  

But even with the assistance of all these algorithms, so much was still done manually as I described above – and reality was still at the end of the day too much to handle. There were too many parameters to deal with. Machines and humans together struggled to get anything meaningful to work.

The Advent of Deep Learning

When DL was introduced (circa 2012) it introduced the concept of end-to-end learning where (in a nutshell) the machine is told to learn what to look for with respect to each specific class of object. It works out the most descriptive and salient features for each object all on its own. In other words, neural networks are told to discover the underlying patterns in classes of images. What is the definition of a bike? A car? A washing machine? The machine works this all out for you. Wired magazine puts it this way:

If you want to teach a [deep] neural network to recognize a cat, for instance, you don’t tell it to look for whiskers, ears, fur, and eyes. You simply show it thousands and thousands of photos of cats, and eventually it works things out. If it keeps misclassifying foxes as cats, you don’t rewrite the code. You just keep coaching it.

The image below portrays this difference between feature extraction (using traditional CV) and end-to-end learning:

traditional-cv-and-dl

Deep learning works by setting up a neural network that can contain millions or even billions of parameters (neurons). These parameters are initially “blank”, let’s say. Then, thousands and thousands of images are sent through the network and slowly over time the parameters are aligned and adjusted accordingly.

Previously, we would have to adjust these parameters ourselves in one way or another, and not in a neural network – but we could only handle hundreds or thousands of parameters. We didn’t have the means to manage more.

So, deep learning has given us the possibility to deal with much, much more complex tasks. It has truly been a revolution for AI. The xkcd comic above is no longer relevant. That problem has been pretty much solved.

The Lack of Creativity in DL

Like I said, now when we have a problem to solve, we throw data at a neural network and then get the machine to work out how to solve the problem – and that’s pretty much it! The long and creative computer vision pipelines of algorithms and tasks are gone. We just use deep learning. There are really only two bottlenecks that we have to deal with: the need for data and time for training. If you have these (and money to pay for the electricity required to power your machines), you can do magic.

(In this article of mine I describe when traditional computer vision techniques still do a better job than deep learning – however, the art is dying out).

Sure, there are still many things that you have control over when opting for a deep neural network solution, e.g. number of layers, and of course hyper-parameters such as learning rate, batch size, and number of epochs. But once you get these more-or-less right, further tuning has diminishing returns.

You also have to choose the neural network that best suits your tasks: convolutional, generative, recurrent, and the like. We more or less know, however, which architecture works best for which task.

Let me put it to you this way: creativity has so much been eliminated from AI that there are now automatic tools available to solve your problems using deep learning. AutoML by Google is my favourite of these. A person with no background in AI or computer vision can use these tools with ease to get very impressive results. They just need to throw enough data at the thing and the tool works the rest out for them automatically.

I dunno, but that feels kind of boring to me.

Maybe I’m wrong. Maybe that’s just me. I’m still proud to be a computer vision expert but it seems that a lot of the fun has been sucked out of it.

However, the results that we get from deep learning are not boring at all! No way. Perhaps I should stop complaining, then.

To be informed when new content like this is posted, subscribe to the mailing list:

AI Will Never Create High Art

There’s been more and more talk about AI these days – and rightly so. The advances have been impressive, to say the least. The hype that has followed has also been impressive, but for the wrong reasons: it’s just been way too much. I’ve written about this persistent hype before at length (e.g. Artificial Intelligence is Over-Hyped).

But, alas, it seems as though I have to return to this topic again. It’s just too jarring.

The latest stuff a lot of people are talking about is AI in art. The topic has been on the news a lot (e.g. here) because AI image generators, like DALL-E 2 and Stable Diffusion, appear to be taking over the jobs of illustrators since it is so easy to create images free of charge based off of textual input. Cosmopolitan magazine, for example, used a DALL-E 2 generated image as a cover on a special issue of theirs:

cosmopolitan-ai-cover
AI-generated image of a magazine cover

That’s impressive. Somebody, however, has definitely missed out on a paycheck for that particular cover of the magazine.

The problem is further exacerbated when one learns that these new image generators have been trained on databases of works of artists that have received no remuneration for passively having participated in the training process. Contentious, to say the least – legal cases abound, in fact.

But I’m not here to throw me $0.02 into this debate. I, of course, have my opinions but in this post I would like to talk about AI and high art in particular because this is a domain that AI has also dared to venture into but it is a domain in which AI has no rightful place – somebody has to say it.

Firstly, let me define what I mean by “high art”. High art is objects that are of exceptional and exemplary aesthetic value. High art is music, literature, paintings, architecture, and other creations of human endeavour that have attained the highest level of human achievement in terms of beauty and sophistication. High art is something that is so passionately moving and impressive that it can evoke the strongest of positive emotions in a person. Tears, awe, admiration, reverence are perhaps such responses that one can suppose when one encounters these objects of exquisite beauty.

High art is undoubtedly something more than the popular and/or commercial art that one typically would deal with on a day to day basis – such as the art currently being generated by AI.

High art is capable of touching the deepest depths of our existence. It generally is art that one would, for example, display in museums or galleries because it is worth preserving for future generations.

There is currently debate going on whether AI is capable of generating such art.

A few months ago (Sep 2022), Jason M. Allen won first prize at an art competition in the USA. This achievement made headlines all over the world. In a way, I get it. A machine passed an unofficial Turing test in image generation. No small feat that deserves to be in the papers. But with this has come a wave of expected hype: AI can beat humans at creativity, AI created beauty, AGI is edging closer and closer. The winner of the competition himself stated to the NY Times: “Art is dead, dude. It’s over. A.I. won. Humans lost”.

*Sigh*. I think we need to take a breather here.

Firstly, let’s put that competition into perspective. It was held at the Colorado State Fair and the top prize was $300. I’m sure the world’s greatest artists frequent that fair every year and vie for that lucrative first place. Secondly, Jason touched up the image with Photoshop and other tools to improve on what the machine had initially generated. So, there was a direct human element at play in the creative process.

But that last point is not really pertinent because I’m not going to deny the fact that a machine could generate an image and with no touch-ups win an art prize. It can and it will. In fact, I have nothing against AI winning competitions for popular/consumer art.

What I think we should be deliberating on is how it is possible for people to think that AI will win prestigious art competitions one day. It hasn’t happened yet, but you just know that with the current mood, it is only a matter of time.

But how are people even considering AI creating high/fine culture a possibility? After all, there is a world of difference between that and popular culture. When one truly and honestly encounters fine culture, one understands that such a feat is beyond the accomplishment of a machine. That’s a given.

Why? Because to create true art one needs to understand the intricacies, tragedies, miracles, and depths of human existence. This understanding is then embodied in a piece of work. Machines, on the other hand, don’t understand. As I’ve said time and time again: “Machines operate on the level of knowledge. We operate on the level of knowledge and understanding”. For example, ChatGPT. It’s fantastic. It’s phenomenal at times. But it still spews out nonsense. With all the information it has been trained on, it nonetheless comes up with trash like this that demonstrates zero understanding (not knowledge!) of the world around us:

Now, some will argue that once machines gain sentience understanding will follow. Yes, that’s probably true. For now, however, machines are nowhere near this accomplishment – as I’ve discussed in this post – because throwing more and more data at a neural network is not going to magically cause a giant leap into consciousness.

(For those that know me, you’ll know that I think machine sentience is unachievable anyway – but we won’t go into that today).

So, let’s stick with what we know and have now: machines that don’t understand and don’t look like they will for at least a good while. The problem is that there is an underlying belief surrounding AI hype that there is some form of understanding being exhibited in the stuff that we are creating. Hence, it is all unfavourably contributing to the debate around AI and high art.

Another answer to the conundrum of why people consider a marriage between AI and high art possible lies quite ironically in what Jason M. Allen said after winning his illustrious prize: “Art is dead”.

If you have visited a modern art gallery or museum recently you will know what I am talking about. I just don’t understand today’s “art”. Nobody does, in fact! In his very informative book entitled “The Death of the Artist“, William Deresiewicz outlines how art has become institutionalised. One now needs a degree to “understand” it. All those nonsensical squiggles and blotches and disharmonies and bizarre architectural distortions need to be explained. No more can one break down into tears in front of an exquisite sculpture or painting. Art doesn’t speak to us directly now.

As Renoir himself once said: “If it needs to be explained, it is not art.”

I couldn’t have said it better myself. Today’s art is not art if it doesn’t evoke anything in the populace. You may recall the furor that surrounded the unveiling of an MLK statue last month (Jan 2023). Tucker Carlson said it outright: “It’s not art, it’s a middle finger.” Moreover, it cost US$10 million to make. High art? Give me a break! Just take a look at this masterpiece created by a 23 year old:

Part of the Pieta created by Michelangelo

Now, that’s art.

The bottom line is that a dead machine can easily produce lifeless works like those that reside in today’s modern museums or galleries. It’s no wonder that people consider AI to be capable of producing art.

Let’s move on to another argument for this post: beauty has been banished from our everyday liveshence, we are no longer being exposed to true works of it. This means we are not getting conditioned to recognise real elegance any more. Ugliness surrounds us so we are not sensitive enough to the subtleties of the depths of beauty.

Allow me to present some examples that I have collected from Twitter posts that compare public works in the past and now. (For more examples like the following, follow this twitter account)

Original Twitter post
Original Twitter post
Original Twitter post

My favourite example, though, is from my alma mater, Trinity College Dublin where I completed my PhD. There is a famous Old Library there that is pure and simply exquisite. It was completed in 1732:

And then right next door is the new library called the Berkeley Library, completed in 1967. Believe me, it’s even more dismal inside:

Image taken from here

But it gets better because next door to the Berkeley Library is the Arts Building, completed in 1979:

Image taken from here

Yep, that’s the Arts Building. Oh, the irony! Trinity College Dublin describes the Arts Building as: ” A nineteen seventies listed architectural masterpiece.” You couldn’t make this stuff up. Moreover, the wise guy architects thought it would be a great idea to have minimal natural light enter the classrooms so there are hardly any windows on the other side of the building. It’s a concrete, miserable tomb that is supposed to inspire future “great” artists.

Yuck! Let’s compare that to the front facade of the same college:

Image taken from here

Once again, the problem is that our lack of sensitivity for the beautiful allows institutions and governments to proliferate cheap, commercial, commodified works on our streets, houses and workplaces, which further exacerbates the problem.

I don’t care what the reasoning behind the buildings above are, what the theory behind their supposed beauty is, whether beauty is purely subjective or not. I don’t care because that stuff is simply (f)ugly.

We have lost the sense of the magnificent. Hence, without understanding what that is, it is easy for people to think that AI can participate in the creation of it.

Parting Words

AI can generate art if by “art” one means the commercial, popular art that AI is currently generating. Of course, generating art is not the same as creating it – after all, the AI models were trained on other people’s works and statistics are being applied to “create” the artificial images that we are seeing online. But I don’t want to get into this argument or any arguments related to it. The whole purpose of my post was to vent my frustration at the debate that AI could find its way into the world’s galleries and museums and be hence classified as high culture.

That’s just not right, for the reasons outlined above.

I’m going to leave you with some parting words from a great lyricist and musician of our time, Nick Cave. Somebody asked ChatGPT to “write a song in the style of Nick Cave” and sent him the results for comment. Nick’s response says it all:

With all the love and respect in the world, this song is bullshit, a grotesque mockery of what it is to be human, and, well, I don’t much like it. [emphasis mine]

Somebody has to say it. We deserve better and are capable of it.

To be informed when new content like this is posted, subscribe to the mailing list:

Phenaki example

AI Video Generation (Text-To-Video Translation)

There have been a number of moments in my career in AI when I have been taken aback by the progress mankind has made in the field. I recall the first time I saw object detection/recognition being performed at near-human level of accuracy by Convolutional Neural Networks (CNNs). I’m pretty sure it was this picture from Google’s MobileNet (mid 2017) that affected me so much that I needed to catch my breath and immediately afterwards exclaim “No way!” (insert expletive in that phrase, too):

MobileNet-detected-objects

When I first started out in Computer Vision way back in 2004 I was adamant that object recognition at this level of expertise and speed would be simply impossible for a machine to achieve because of the inherent level of complexity involved. I was truly convinced of this. There were just too many parameters for a machine to handle! And yet, there I was being proven wrong. It was an incredible moment of awe, one which I frequently recall to my students when I lecture on AI.

Since then, I’ve learnt to not underestimate the power of science. But I still get caught out from time to time. Well, maybe not caught out (because I really did learn my lesson) but more like taken aback.

The second memorable moment in my career when I pushed my swivel chair away from my desk and once more exclaimed “No way!” (insert expletive there again) was when I saw image-to-text translation (you provide a text prompt and a machine creates images based on it) being performed by DALL-E in January of 2021. For example:

dall-e-example-output

dall-e-output-example

I wrote about DALL-E’s initial capabilities at the end of this post on GPT3. Since then, OpenAI has released DALL-E 2, which is even more awe-inspiring. But that initial moment in January of last year will forever be ingrained in my mind – because a machine creating images from scratch based on text input is something truly remarkable.

This year, we’ve seen text-to-image translation become mainstream. It’s been on the news, John Oliver made a video about it, various open source implementations have been released to the general public (e.g. DeepAI – try it out yourself!), and it has achieved some milestones – for example, Cosmopolitan magazine used a DALL-E 2 generated image as a cover on a special issue of theirs:

cosmopolitan-ai-cover

That does look groovy, you have to admit.

My third “No way!” moment (with expletive, of course) occurred only a few weeks ago. It happened when I realised that text-to-video translation (you provide a text prompt and a machine creates a series of videos based on it) is likewise on its way to potentially become mainstream. 4 weeks ago (Oct 2022) Google presented ImagenVideo and a short time later also published another solution called Phenaki. A month earlier to this, Meta’s text-to-video translation application was announced called Make-A-Video (Sep 2022), which in turn was preceded by CogVideo by Tsinghua University (May 2022).

All of these solutions are in their infancy stages. Apart from Phenaki, videos generated after providing an initial text input/instruction are only a few seconds in length. No generated videos have audio. Results aren’t perfect with distortions (aka artefacts) clearly visible. And the videos that we have seen have undoubtedly been cherry-picked (CogVideo, however, has been released as open source to the public so one can try it out oneself). But hey, the videos are not bad either! You have to start somewhere, right?

Let’s take a look at some examples generated by these four models. Remember, this is a machine creating videos purely from text input – nothing else.

CogVideo from Tsinghua University

Text prompt: “A happy dog” (video source)

cogvideo-dog-eg

Here is an entire series of videos created by the model that is presented on the official github site (you may need to press “play” to see the videos in motion):

As I mentioned earlier, CogVideo is available as open source software, so you can download the model yourself and run it on your machine if you have an A100 GPU. And you can also play around with an online demo here. The one down side of this model is that it only accepts simplified Chinese as text input, so you’ll need to get your Google Translate up and running, too, if you’re not familiar with the language.

Make-A-Video from Meta

Some example videos generated from text input:

make-a-video-example-teddy-painting
Text prompt: “A teddy bear painting a portrait”

An example media generated by meta's application
Text prompt: a young couple walking in heavy rain

An example image generated by meta
Text prompt: A dog wearing a Superhero outfit with red cape flying through the sky

The other amazing features of Make-A-Video are that you can provide a still image and get the application to give it motion, or you can provide 2 still images and the application will “fill-in” the motion between them, or you can provide a video and request different variations of this video to be produced.

Example – left image is input image, right image shows generated motion for it:

Input diagram to be transformed to a video  

It’s hard not to be impressed by this. However, as I mentioned earlier, these results are obviously cherry-picked. We do not have access to any API or code to produce our own creations.

ImagenVideo from Google

Google’s first solution attempts to build on the quality of Meta’s and Tsinghua University’s releases. Firstly, the resolution of videos has been upscaled to 1024×768 with 24 fps (frames per second). Meta’s videos by default are created with 256 x 256 resolution. Meta mentions, however, that max resolution can be set to 768 x 768 with 16 fps. CogVideo has similar limitations to their generated videos.

Here are some examples released by Google from ImagenVideo:

ImagenVideo example
Text prompt: Flying through an intense battle between pirate ships in a stormy ocean

ImagenVideo example
Text prompt: An astronaut riding a horse

ImagenVideo example
Text prompt: A panda eating bamboo on a rock

Google claims that the videos generated surpass those of other state-of-the-art models. Supposedly, ImagenVideo has a better understanding of the 3D world and can also process much more complex text inputs. If you look at the examples presented by Google on their project’s page, it appears as though their claim is not unfounded.

Phenaki by Google

This is a solution that really blew my mind.

While ImagenVideo had its focus on quality, Phenaki, which was developed by a different team of Google researchers, focussed on coherency and length. With Phenaki, a user can present a long list of prompts (rather than just one) that the system then takes and creates a film of arbitrary length. Similar kinds of glitches and jitteriness are exhibited in these generated clips, but the fact that videos can be created of two-minute plus length, is just astounding (although of lower resolution). Truly.

Here are some examples:

Phenaki example
Text prompts: A photorealistic teddy bear is swimming in the ocean at San Francisco. The teddy bear goes under water. The teddy bear keeps swimming under the water with colorful fishes. A panda bear is swimming under water

Phenaki example
Text prompts: Side view of an astronaut walking through a puddle on mars. The astronaut is dancing on mars. The astronaut walks his dog on mars. The astronaut and his dog watch fireworks

Phenaki can also generate videos from single images, but these images can additionally be accompanied by text prompts. The following example uses the input image as its first frame and then builds on that by following the text prompt:

Phenaki example
Accompanying text prompt: A white cat touches the camera with the paw

For more amazing examples like this (including a few 2+ minute videos), I would encourage you to view the project’s page.

Furthermore, word on the street is that the team behind ImagenVideo and Phenaki are combining strengths to produce something even better. Watch this space!

Conclusion

A few months ago I wrote two posts on this blog discussing why I think AI is starting to slow down (part 2 here) and that there is evidence that we’re slowly beginning to hit the ceiling of AI’s possibilities (unless new breakthroughs occur). I still stand by that post because of the sheer amount of money and time that is required to train any of these large neural networks performing these feats. This is the main reason I was so astonished to see text-to-video models being released so quickly after only just getting used to their text-to-image counterparts. I thought we would be a long way away from this. But science found a way, didn’t it?

So, what’s next in store for us? What will cause another “No way!” moment for me? Text-to-music generation and text-to-video with audio would be nice wouldn’t it? I’ll try to research these out and see how far we are from them and present my findings in a future post.

To be informed when new content like this is posted, subscribe to the mailing list:

wooden-cubes

The Need for New Terminology in AI

There is a fundamental difference between humans and machines. Jack Ma, Chinese business magnate, co-founder of Alibaba, 35th richest man in the world, once said (in his half-broken English):

Computers only have chips, men have the heart. It’s the heart where the wisdom comes from.

Forgive me if I also quote myself on this topic:

This is a tough issue to talk about because of the other opinions on the matter. Many people like to think that we operate much like machines. That we are predictable like them, that we are as deterministic as them. Meaning that given enough data to train on, a machine will one day be as smart or intelligent as humans, if not more so.



I like to, however, think that the vast majority of people would side with Jack Ma on this: that there really is something fundamentally different between us and machines. Certainly, my years of experience in teaching confirms this observation. The thousands of people that I’ve interacted with really do believe we have something like a “heart” that machines do not have and never will. Its this heart that gives us the ability to truly be creative or wise, for example.

Some of you may know that along with my PhD in Artificial Intelligence I also have a Master’s in Philosophy and additionally a Master’s in Theology. If I were to argue my point from the perspective of a theologian, it would be easy to do so: we’re created by a Supreme Being that has endowed us with an eternal soul. Anything that we ourselves create with our own hands will always lack this one decisive element. The soul is the seat of our “heart”. Hence, machines will never be like us. Ever.

But alas, this is not a religious blog. It is a technical one. So, I must argue my case from a technical standpoint – much like I have been doing with my other posts.

It’s hard to so, however. How do I prove that we are fundamentally different to machines and always will be? Everyone’s opinion on the matter has just as much weight on the rational level. It seems as though we’re all floating in the ether of opinions on this one without having any hook to grasp on and build something concrete.

But that’s not, thankfully, entirely the case. As I’ve mentioned earlier, we can turn to our instincts or intuitions and speak about our “hearts”. Although turning to intuition or instinct is not technically science, it’s a viable recourse as science hasn’t said all things decisively on this topic. Where science falters, sometimes all we have left are our instincts, and there’s nothing wrong with utilising them as an anchor in the vast sea of opinions.

But the other thing we can do is turn to professionals who work full-time on robots, machines, and AI in general and seek their opinion on the matter. I’ve spoken at length on this in the past (e.g. here and here) so I’ll only add one more quote to the pot from Zachary Lipton, Assistant Professor of Machine Learning and Operations Research at Carnegie Mellon University:

But these [language models] are just statistical models, the same as those that Google uses to play board games or that your phone uses to make predictions about what word you’re saying in order to transcribe your messages. They are no more sentient than a bowl of noodles, or your shoes.[emphasis mine]

Generally speaking, then, what I wish to get across is that if you work in the field of AI, if you understand what is happening under the hood of AI, there is no way that you can honestly and truthfully say that machines currently are capable of human-level intelligence or any form of sentience. They are “no more sentient than a bowl of noodles” because they “are just statistical models”.

Even though a machine might look intelligent, does not mean it is.

Hence, if AI is no more sentient than your pair of shoes, if AI is just applied statistics, I’d like to argue the case that perhaps the terminology used in the field is imprecise.

Terms like “intelligence”, “understanding”, “comprehending”, “learning” are loaded and imply something profound in the existence of an entity that is said to be or do those things. Let’s take “understanding”, as an example. Understanding is more than just memorising and storing information. It is grasping the essence of something profoundly. It is storing some form of information, yes, but it is also making an idea your own so that you can manoeuvre around it freely. Nothing “robotlike” (for want of a better word) or deterministic is exhibited in understanding. Understanding undoubtedly involves a deeper process than knowing.

Similar things can be said for “intelligence” and “learning”.

So, the problem is that the aforementioned terms are being misunderstood and misinterpreted when used in AI. Predicating “intelligence” with “artificial” doesn’t do enough to uphold the divide between humans and machines. Likewise, the adjective “machine” in “machine learning” doesn’t separate enough our learning from what machines really do when they acquire new information. In this case, machines update or readjust their statistical models – they do not “adapt”.

Not strictly speaking, anyway.

This is where things become a little murky, in truth, because the words being used do contain in them elements of what is really happening. I’m not going to deny that. Machines do “learn” in a way, and they do “adapt” in a way, too.

However, the confusion in the world is real – and as a result AI is over-hyped because the media and people like Elon Musk spew out these words as if they applied equally to machines and us. But they do not, do they? And if they do not, then perhaps different terms should be devised to quash the confusion and hype that we see being exhibited before our eyes.

As scientists we ought to be precise with our terminologies and currently we are not.

What new terms should be devised or what different terms should be used is up for debate. I’ve already suggested that the word “adapted” should be changed to “readjusted” or “recalibrated”. That’s more precise, in my opinion. “Artificial Intelligence” should perhaps be renamed to “Applied Statistics”. We can think of other alternatives, I’m sure.

Can you picture, though, how the hype around AI would diminish if it was suddenly being referred to as Applied Statistics? No more unreal notions of grandeur for this field. The human “heart” could potentially reclaim its spot on the pedestal. And that’s the whole point of this post, I guess.

Parting words

What I’m suggesting here is grand. It’s a significant scenario that would effect repercussions. I definitely do not want people to stop trying to attain human-level intelligence (AGI, as it’s sometimes referred to). We’ve achieved a lot over the last decade with people’s aims being directed specifically towards this purpose. But I still think we need to be precise and accurate in our terminologies. Human capabilities and dignity for that matter need to be upholded.

I also mentioned that most scientists working in machine learning would honestly say that AI entities are not strictly speaking intelligent. That does not mean, however, that they do not believe that things may not improve to the point where the aforementioned terms would become applicable and precise in AI. Perhaps in the future machines will be truly intelligent and machines really will understand? In my opinion this will never occur (that’s for another post) but for the time being it is safe to say that we are far away from attaining that level of development. Kai-Fu Lee, for example, who was once head of Google China, an exec at Apple, and Assistant Professor at Carnegie Mellon University, gives a date of around 2045 for machines to start to display some form of real intelligence (I wrote a book review about his take on AI in this post). And that’s a prediction that, as he admits, will require great breakthroughs to occur in AI in the meantime that may never transpire, as is the nature of breakthroughs. We must live in the present, then, and currently, in my opinion, more harm is being done now with the abounding misunderstandings that calls for some form of terminology reform.

The other dilemma comes up with respect to animals. We can certainly call some animals “intelligent” and mention the fact that they “learn”. But, once again, it’s a different form of intelligence, learning, etc. It’s still not as profound as what humans do. However, it’s much more accurate to use these terms on animals than on machines. Animals have life. AI is as dead as your bowl of noodles or your pair of shoes.

Lastly, I deliberately steered away from trying to define terms like “understand”, “learn”, etc. I think it would be best for us to stick with our intuitions on this matter rather than getting bogged down in heavy semantics. At least for the time being. I think it’s more important for now to have the bigger picture in view.

To be informed when new content like this is posted, subscribe to the mailing list:

happy-robot

Artificial Intelligence is Over-Hyped

I’ve written before that AI is Fundamentally Unintelligent. I’ve also written that AI is Slowing Down (part 2 here). One would perhaps consider me a pessimist or cynic of AI if I were to write another post criticising AI from another perspective. But please don’t think ill of me as I embark on exactly such an endeavour. I love AI and always will, which is why most of my posts show AI and its achievements in a positive light (my favourite post is probably on the exploits of GPT-3).

But AI is not only fundamentally unintelligent but at the moment it is fundamentally over-hyped. And somebody has to say it. So, here goes.



When new disruptive technological innovations hit the mainstream, hype inadvertently follows. This seems to be human nature. We saw this with the dot-com bubble, we saw it with Bitcoin and co. in 2017, we saw it with Virtual Reality in around 2015 (although, purportedly, VR is on the rise again – although I’m yet to be convinced about its touted potential success), and likewise with 3D glasses and 3D films at around the same time. The mirages come and then dissipate.

The common trend of hype and delusion that tends to follow exceptional growth and success of a technological innovation is such a common occurrence that people have come up with ways to describe it. Indeed, Gartner, the American research, advisory and information technology firm, developed a graphical representation of this phenomenon. They call it the “Gartner Hype Cycle”, and it is portrayed in the image below:

gartners-hype-cycle

It is my opinion that we have just passed the initial crest and are slowly learning that our idea of AI has not lived up to expectations. A vast number of projects are failing today that were deemed to initially be sound undertakings. Some projects are failing so badly that the common, average person when hearing of them turns to common sense and wonders why it has been abandoned by the seemingly bright minds of the world.

Here are some sobering statistics:

Those are quite staggering numbers. Now, the reasons behind the failures of these projects are numerous: bad data, poor access to data, poor data science practices, etc. But I wish to argue my case that a significant part of the problem is that AI (this would also encompass data science) is over-hyped, i.e. that we believe too much in data and especially too much in AI’s capabilities. There seems to be a widely held belief that we can throw AI at anything and it will find an appropriate solution.

Let’s take a look at some of the projects that have failed in the last few years.

In 2020 two professors and a graduate student from Harrisburg University in Pennsylvania announced that they are publishing a paper entitled “A Deep Neural Network Model to Predict Criminality Using Image Processing“. This paper purported the following:

With 80 percent accuracy and with no racial bias, the software can predict if someone is a criminal based solely on a picture of their face. The software is intended to help law enforcement prevent crime.

What’s more, this paper was accepted for publication at the prestigious Springer Nature journal. Thankfully, a backlash ensued among the academic community that condemned the paper and Springer Nature confirmed on Twitter that the paper was to be retracted.

Funnily enough, a paper on pretty much the identical topic was also due to be published in the Journal of Big Data that same year entitled: “Criminal tendency detection from facial images and the gender bias effect“. This paper was also retracted.

It is mind-boggling to think that people, moreover experienced academics, could possibly believe that faces can disclose potential criminal tendencies in a person. Some people definitely have a mug that if spotted in a dark alley in the middle of the night would give anybody a heart attack, but this is still not an indicator that the person is a criminal.

Has common sense been thrown out the door? Are AI and data science perceived as great omniscient entities that should be adored and definitely not ever questioned?

Let’s see what other gaffs have occurred in the recent past.

In 2020, as the pandemic was in full swing, university entrance exams in the UK (A-levels) were cancelled. So, the British government decided to develop an AI algorithm to automatically grade students instead. Like that wasn’t going to backfire!? A perfect example, however, of when too much trust is put in artificial intelligence, especially by the cash-stricken public sector. The whole thing turned into a scandal because, of course, the algorithm didn’t do its intended job. 40% of students had their grades lowered by virtue of the algorithm favouring those from private schools and wealthy areas. There was obviously demographic bias in the data used to train the model.

But the fact that an algorithm was used to directly make important, life-changing decisions impacting the public is a sign that too much trust is being placed in AI. There are some things that AI just cannot do – looking past raw data is one such thing (more on this in a later post).

This trend of over-trusting AI in the UK was revealed in 2020 to be much deeper than once thought, however. One study by the Guardian found that one in three councils were (in secret) using algorithms to help make decisions about benefit claims and other welfare issues. The Guardian also found that about 20 councils have stopped using an algorithm to flag claims as “high risk” for potential welfare fraud. Furthermore, Hackney council in East London abandoned using AI to help predict which children were at risk of neglect and abuse. And then, the Home Office was embroiled in a scandal of its own when it was revealed that its algorithm to determine visa eligibility allegedly had racism entrenched in it. And the list goes on.

Dr Joanna Redden from the Cardiff Data Justice Lab who worked on researching why so many algorithms were being cancelled said:

[A]lgorithmic and predictive decision systems are leading to a wide range of harms globally, and also that a number of government bodies across different countries are pausing or cancelling their use of these kinds of systems. The reasons for cancelling range from problems in the way the systems work to concerns about negative effects and bias.

Indeed, perhaps it’s time to stop placing so much trust in data and algorithms? Enough is definitely not being said about the limitations of AI.

The media and charismatic public figures are not helping the cause either. They’re partly to blame for these scandals and failures that are causing people grief and costing the taxpayers millions because they keep this hype alive and thriving.

Indeed, level-headedness never makes the headlines – only sensationalism does. So, when somebody like the billionaire tech-titan Elon Musk opens his big mouth, the media laps it up. Here are some of the things Elon has said in the past about AI.

In 2017:

I have exposure to the most cutting edge AI, and I think people should be really concerned by it… AI is a fundamental risk to the existence of human civilization.

2018:

I think that [AI] is the single biggest existential crisis that we face and the most pressing one.

2020:

…we’re headed toward a situation where A.I. is vastly smarter than humans and I think that time frame is less than five years from now.

Please, enough already! Anybody with “exposure to the most cutting edge AI” would know that as AI currently stands, we are nowhere near developing anything that will “vastly outsmart” us by 2025. As I’ve said before (here and here), the engine of AI is Deep Learning, and all evidence points to the fact that this engine is in overdrive – i.e. that we’re slowly reaching its top speed. We soon won’t be able to squeeze anything more out of it. 

But when Elon Musk says stuff like this, it captures people’s imaginations and it makes the papers (e.g. CNBC and The New York Times). He’s lying, though. Blatantly lying. Why? Because Elon Musk has a vested interest in over-hyping AI. His companies thrive on the hype, especially Tesla.

Here’s proof that he’s a liar. Elon has predicted for nine years in a row, starting in 2014, that autonomous cars are at most a year away from mass production. I’ll say that once again: for nine years in a row, Elon has publicly stated that autonomous cars are only just around the corner. For example:

2016:

My car will drive from LA to New York fully autonomously in 2017

It didn’t happen. 2019:

I think we will be feature-complete full self-driving this year… I would say that I am certain of that. That is not a question mark.

It didn’t happen. 2020:

I remain confident that we will have the basic functionality for level five autonomy complete this year… I think there are no fundamental challenges remaining for level five autonomy.

It didn’t happen. 2022:

And my personal guess is that we’ll achieve Full Self-Driving this year, yes.

It’s not going to happen this year, either, for sure.

How does he get away with it? Maybe because the guy oozes charisma? It’s obvious, though, that he makes money by talking in this way. Those, however, working directly in the field of AI like myself have had enough of his big mouth. Here’s, for example, Jerome Pesenti, head of AI at Facebook, venting his frustration at Elon on Twitter:

Jerome will never make the papers by talking down AI, though, will he?

There was a beautiful example of how the media goes crazy over AI only recently, in fact. A month ago, Google developed its own new language model (think: chatbot) called LaMDA, which is much like GPT-3. It can sometimes hold very realistic conversations. But ultimately it is still just a machine – as dumb as a can of spaghetti. The chatbot follows simple processes behind the scenes, as Business Insider reports.

However, there was one engineer at Google, Blake Lemoine, who wanted to make a name for himself and who decided to share some snippets of his conversations with the program to make a claim that the chatbot has become sentient. (Sigh).

Here are some imagination-grabbing headlines that ensued:

Blake Lemoine is loving the publicity. He now claims that the AI chatbot has hired itself a lawyer to defend its rights and that they are also now friends. Cue the headlines again (I’ll spare you the list of eye-rolling, tabloid-like articles).

Google has since suspended its engineer for causing this circus and released the following statement:

Our team — including ethicists and technologists — has reviewed Blake’s concerns… and have informed him that the evidence does not support his claims. He was told that there was no evidence that LaMDA was sentient (and lots of evidence against it)[emphasis mine]

I understand how these “chatbots” work, so I don’t need to see any evidence against Blake’s claims. LaMDA just SEEMS sentient SOMETIMES. And that’s the problem. If you only share cherry-picked snippets of an AI entity, if you only show the parts that are obviously going to make headlines, then of course, there will be an explosion in the media about it and people will believe that we have created a Terminator robot. If, however, you look at the whole picture, there is no way that you can attain the conviction that there is sentience in this program (I’ve written about this idea for the GPT-3 language model here).

Conclusion

This ends my discussion on the topic that AI is over-hyped. So many projects are failing because of it. We as taxpayers are paying for it. People are getting hurt and even dying (more on this later) because of it. The media needs to stop stoking the fire because they’re not helping. People like Elon Musk need to keep their selfish mouths shut. And more level-headed discussions need to take place in the public sphere. I’ve written about such discussions before in my review of “AI Superpowers” by Kai-Fu Lee. He has no vested interest in exaggerating AI and his book, hence, is what should be making the papers, not some guy called Blake Lemoine (who also happens to be a “pagan/Christian mystic priest”, whatever that means).

In my next post I will extend this topic and discuss it in the context of autonomous cars.

To be informed when new content like this is posted, subscribe to the mailing list:

ai-superpowers-book-cover

AI Superpowers by Kai-Fu Lee – Review

Summary of review: This book is the best analysis of the state of Artificial Intelligence currently in print. It is a cool and level-headed presentation and discussion on a broad range of topics. An easy 5-stars. 

“AI Superpowers: China, Silicon Valley, and the New World Order” is a book about the current state of Artificial Intelligence. Although published in late 2018 – light years ago for computer science – it is still very much relevant. This is important because it is the best book on AI that I have read to date. I’d hate for it to become obsolete because it is simply the most level-headed and accurate analysis of the topic currently in print.

kai-fu-lee-pictureProfessor Kai-Fu Lee knows what he’s talking about. He’s been at the forefront of research in AI for decades. From Assistant Professor at Carnegie Mellon University (where I currently teach at the Australian campus), to Principal Speech Scientist at Apple, to the founding director of Microsoft Research Asia, and then to the President of Google China – you really cannot top Kai-Fu’s resume in the field of AI. He is definitely a top authority, so we cannot but take heed of his words.

However, what made me take notice of his analyses was that he was speaking from the perspective of an “outsider”. I’ll explain what I mean.


So often, when it comes to AI, we see people being consumed by hype and/or by greed. The field of AI has moved at a lightning pace in the past decade. The media has stirred up a frenzy, imaginations are running wild, investors are pumping billions into projects, and as a result an atmosphere of excitement has descended upon researchers and the industry that makes impartial judgement and analysis extremely difficult. Moreover, greedy and charismatic people like Elon Musk are blatantly lying about the capabilities of AI and hence adding fuel to the fire of elation.

It wasn’t until Kai-Fu Lee was diagnosed with Stage IV lymphoma and given only a few months to live that he was also a full-fledged player and participant in this craze. Subsequently, he reassessed his life, his career, and decided to step away from his maniacal work schedule (to use his own words, pretty much). Time spent in a Buddhist monastery gave him further perspective on life and a clarity and composure of thought that shines through his book. He writes, then, in some respects as an “outsider” – but with forceful authority.

This is what I love about his work. Too often I cringe at people talking about AI soon taking over the world, AI being smarter than humans, etc. – opinions based on fantasy. Kai-Fu Lee straight out says that as things stand, we are nowhere near that level of intelligence exhibited by machines. Not by 2025 (as Elon Musk has said in the past), not even by 2040 (as a lot of others are touting) will we achieve this level. His discussion of why this is the case is based on pure and cold facts. Nothing else. (In fact, his reasonings are based on what I’ve said before on my blog, e.g. in this post: “Artificial Intelligence is Slowing Down“).

All analyses in this book are level-headed in this way, and it’s hard to argue with them as a result.

Some points of discussion in “AI Superpowers” that I, also a veteran of the field of AI, particularly found interesting are as follows:

  • Data, the fuel of Deep Learning (as I discuss in this post), is going to be a principal factor in determining who will be the world leader in AI. The more data one has, the more powerful AI can be. In this respect, China with its lax laws on data privacy, larger population, coupled with cut-throat tactics in the procuring of AI research, and heavy government assistance and encouragement has a good chance to surpass the USA as a superpower of AI. For example, China makes 10 times more food deliveries and 4 times more ride-sharing calls than the US. That equates to a lot more data that can be processed by companies to fuel algorithms that improve their services.
  • Despite AI not currently being capable of achieving human-level intelligence, Kai-Fu predicts, along with other organisations such as Gartner, that around 30-40% of professions will be significantly affected by AI. This means that huge upheavals and even revolutions in the workforce are due to take place. This, however, is my one major disagreement with Lee’s opinions. Personally, I believe the influence of AI will be a lot more gradual than Prof. Lee surmises and hence the time given to adjust to the upcoming changes will be enough to avoid potentially ruinous effects.
  • No US company in China has made significant in-roads into Chinese society. Uber, Google, eBay, Amazon – all these internet juggernauts have utterly failed in China. The very insightful analysis of this phenomenon could only have been conducted so thoroughly by somebody who has lived and worked in China at the highest level.
  • There is a large section in the book discussing the difference between humans and machines. This was another highlight for me. So many times, in the age of online learning (as I discuss in this post), remote working, social media, and especially automation, we neglect to factor in the importance of human contact and human presence. Once again, a level-headed analysis is presented that ultimately concludes that machines (chat-bots, robots, etc.) simply cannot entirely replace humans and human presence. There is something fundamentally different between us, no matter how far technology may progress. I’ve mentioned this adage of mine before: “Machines operate on the level of knowledge. We operate on the level of knowledge and understanding.” It’s nice to see an AI guru replicating this thought.

Conclusion

To conclude, then, “AI Superpowers: China, Silicon Valley, and the New World Order” is a fantastic dive into the current state of affairs surrounding AI in the world. Since China and the US are world leaders in this field, a lot of time is devoted to these countries: mostly on where they currently stand and where they’re headed. Kai-Fu Lee is a world authority on everything he writes about. And since he also does not have a vested interested in promoting his opinions, his words carry a lot more weight than others. As I’ve said above, this to me is the best book currently in print on this topic of AI. The fact that Prof. Lee also writes clearly and accessibly, even those unacquainted with technical terminology will be able to follow all that is presented in this work.

Rating: An easy 5 stars. 

To be informed when new content like this is posted, subscribe to the mailing list: