scale-cv-dl

Deep Learning Eliminated Creativity in AI

Deep learning (DL) revolutionised computer vision (CV) and artificial intelligence in general. It was a huge breakthrough (circa 2012) that allowed AI to blast into the headlines and into our lives like never before. ChatGPT, DALL-E 2, autonomous cars, etc. – deep learning is the engine driving these stories. DL is so good, that it has reached a point where every solution to a problem involving AI is now most probably being solved using it. Just take a look at any academic conference/workshop and scan through the presented publications. All of them, no matter who, what, where or when, present their solutions with DL.

Now, DL is great, don’t get me wrong. I’m lapping up all the achievements we’ve been witnessing. What a time to be alive! Moreover, deep learning is responsible for placing CV on the map in the industry, as I’ve discussed in previous posts of mine. CV is now a profitable and useful enterprise, so I really have nothing to complain about. (CV used to just be a predominantly theoretical field found usually only in academia due to the inherent difficulty of processing videos and images.)

Nonetheless, I do have one little qualm with what is happening around us. With the ubiquity of DL, I feel as though creativity in AI has been killed.

To explain what I mean, I’ll discuss first how DL changed the way we do things. I’ll stick to examples in computer vision to make things easier, but you can easily transpose my opinions/examples to other fields of AI.

Traditional Computer Vision

Before the emergence of DL if you had a task such as object classification/detection in images (where you try to write an algorithm to detect what objects are in an image), you would sit down and work out what features define each and every particular object that you wished to detect. What are the salient features that define a chair, a bike, a car, etc.? Bikes have two wheels, a handlebar and pedals. Great! Let’s put that into our code: “Machine, look for clusters of pixels that match this definition of a bike wheel, pedal, etc. If you find enough of these features, we have a bicycle in our photo!”

So, I would take a photo of my bike leaning against my white wall and I then feed it to my algorithm. At each iteration of my experiments I would work away by manually fine tuning my “bike definition” in my code to get my algorithm to detect that particular bike in my photo: “Machine, actually this is a better definition of a pedal. Try this one out now.”

Once I would start to see things working, I’d take a few more pictures of my bike at different angles and repeat the process on these images until I would get my algorithm to work reasonably well on these too.

Then it would be time to ship the algorithm to clients.

Bad idea! It turns out that a simple task like this becomes impossible to do because a bike in a real-world picture has an infinite number of variations. They come in different shapes, sizes, colours and then on top of that you have to add the different variations that occur with lighting and weather changes and occlusions from other objects. Not to mention the infinite number of angles into which you can position a bike. All these permutations are too much to handle for us mere humans: “Machine, actually I simply can’t give you all the possible definitions in terms of clusters of pixels of a bike wheel because there are too many parameters for me to deal with manually. Sorry.”

Incidentally, there’s a famous xkcd cartoon that captures the problem nicely:

xkcd-computer-vision
(image taken from here)

Creativity in Traditional Computer Vision

Now, I’ve simplified the above process greatly and abstracted over a lot of things. But the basic gist is there: the real world was hard for AI to work in and to create workable solutions you were forced to be creative. Creativity on the part of engineers and researchers revolved around getting to understand the problem exceptionally well and then turning towards an innovative and visionary mind to find a perfect solution.

Algorithms abounded to assist us. For example, one would commonly employ things like edge detectioncorner detection, and colour segmentation to simplify images to assist us with locating our objects, for example. The image below shows you how an edge detector works to “break down” an image:

edge-detection-example
(image example taken from here)

Colour segmentation works by changing all shades of dominant colours in an image into one shade only, like so:

colour-thresholding-example
(image example taken from here)

The second image is much easier to deal with. If you had to write an algorithm for a robot to find the ball, you would now ask the algorithm to look for patches of pixels of only ONE particular shade of orange. You would no longer need to worry about changes in lighting and shading that would affect the colour of the ball (like in the left image) because everything would be uniform. That is, all pixels that you would deal with would be one single colour. And suddenly your definitions of objects that you were trying to locate were not as dense. The number of parameters needed dropped significantly.

Machine learning would also be employed. Algorithms like SVM, k-means clustering, random decision forests, Naive Bayes were there at our disposal. You would have to think about which of these would best suit your use-case and how best to optimise them.

And then there were also feature detectors – algorithms that attempted to detect salient features for you to help you in the process of creating your own definitions of objects. The SIFT and SURF algorithms deserve Oscars for what they did in this respect back in the day.

Probably, my favourite algorithm of all time is the Viola-Jones Face Detection algorithm. It is ingenious in its simplicity and for the first time allowed face detection (and not only) to be performed in real-time in 2001. It was a big breakthrough in those days. You could use this algorithm to detect where faces were in an image and then focus your analysis on that particular area for facial recognition tasks. Problem simplified!

Anyway, all the algorithms were there to assist us in our tasks. When things worked, it was like watching a symphony playing in harmony. This algorithm coupled with this algorithm using this machine learning technique that was then fed through this particular task, etc. It was beautiful. I would go as far as to say that at times it was art.  

But even with the assistance of all these algorithms, so much was still done manually as I described above – and reality was still at the end of the day too much to handle. There were too many parameters to deal with. Machines and humans together struggled to get anything meaningful to work.

The Advent of Deep Learning

When DL was introduced (circa 2012) it introduced the concept of end-to-end learning where (in a nutshell) the machine is told to learn what to look for with respect to each specific class of object. It works out the most descriptive and salient features for each object all on its own. In other words, neural networks are told to discover the underlying patterns in classes of images. What is the definition of a bike? A car? A washing machine? The machine works this all out for you. Wired magazine puts it this way:

If you want to teach a [deep] neural network to recognize a cat, for instance, you don’t tell it to look for whiskers, ears, fur, and eyes. You simply show it thousands and thousands of photos of cats, and eventually it works things out. If it keeps misclassifying foxes as cats, you don’t rewrite the code. You just keep coaching it.

The image below portrays this difference between feature extraction (using traditional CV) and end-to-end learning:

traditional-cv-and-dl

Deep learning works by setting up a neural network that can contain millions or even billions of parameters (neurons). These parameters are initially “blank”, let’s say. Then, thousands and thousands of images are sent through the network and slowly over time the parameters are aligned and adjusted accordingly.

Previously, we would have to adjust these parameters ourselves in one way or another, and not in a neural network – but we could only handle hundreds or thousands of parameters. We didn’t have the means to manage more.

So, deep learning has given us the possibility to deal with much, much more complex tasks. It has truly been a revolution for AI. The xkcd comic above is no longer relevant. That problem has been pretty much solved.

The Lack of Creativity in DL

Like I said, now when we have a problem to solve, we throw data at a neural network and then get the machine to work out how to solve the problem – and that’s pretty much it! The long and creative computer vision pipelines of algorithms and tasks are gone. We just use deep learning. There are really only two bottlenecks that we have to deal with: the need for data and time for training. If you have these (and money to pay for the electricity required to power your machines), you can do magic.

(In this article of mine I describe when traditional computer vision techniques still do a better job than deep learning – however, the art is dying out).

Sure, there are still many things that you have control over when opting for a deep neural network solution, e.g. number of layers, and of course hyper-parameters such as learning rate, batch size, and number of epochs. But once you get these more-or-less right, further tuning has diminishing returns.

You also have to choose the neural network that best suits your tasks: convolutional, generative, recurrent, and the like. We more or less know, however, which architecture works best for which task.

Let me put it to you this way: creativity has so much been eliminated from AI that there are now automatic tools available to solve your problems using deep learning. AutoML by Google is my favourite of these. A person with no background in AI or computer vision can use these tools with ease to get very impressive results. They just need to throw enough data at the thing and the tool works the rest out for them automatically.

I dunno, but that feels kind of boring to me.

Maybe I’m wrong. Maybe that’s just me. I’m still proud to be a computer vision expert but it seems that a lot of the fun has been sucked out of it.

However, the results that we get from deep learning are not boring at all! No way. Perhaps I should stop complaining, then.

To be informed when new content like this is posted, subscribe to the mailing list:

AI Will Never Create High Art

There’s been more and more talk about AI these days – and rightly so. The advances have been impressive, to say the least. The hype that has followed has also been impressive, but for the wrong reasons: it’s just been way too much. I’ve written about this persistent hype before at length (e.g. Artificial Intelligence is Over-Hyped).

But, alas, it seems as though I have to return to this topic again. It’s just too jarring.

The latest stuff a lot of people are talking about is AI in art. The topic has been on the news a lot (e.g. here) because AI image generators, like DALL-E 2 and Stable Diffusion, appear to be taking over the jobs of illustrators since it is so easy to create images free of charge based off of textual input. Cosmopolitan magazine, for example, used a DALL-E 2 generated image as a cover on a special issue of theirs:

cosmopolitan-ai-cover
AI-generated image of a magazine cover

That’s impressive. Somebody, however, has definitely missed out on a paycheck for that particular cover of the magazine.

The problem is further exacerbated when one learns that these new image generators have been trained on databases of works of artists that have received no remuneration for passively having participated in the training process. Contentious, to say the least – legal cases abound, in fact.

But I’m not here to throw me $0.02 into this debate. I, of course, have my opinions but in this post I would like to talk about AI and high art in particular because this is a domain that AI has also dared to venture into but it is a domain in which AI has no rightful place – somebody has to say it.

Firstly, let me define what I mean by “high art”. High art is objects that are of exceptional and exemplary aesthetic value. High art is music, literature, paintings, architecture, and other creations of human endeavour that have attained the highest level of human achievement in terms of beauty and sophistication. High art is something that is so passionately moving and impressive that it can evoke the strongest of positive emotions in a person. Tears, awe, admiration, reverence are perhaps such responses that one can suppose when one encounters these objects of exquisite beauty.

High art is undoubtedly something more than the popular and/or commercial art that one typically would deal with on a day to day basis – such as the art currently being generated by AI.

High art is capable of touching the deepest depths of our existence. It generally is art that one would, for example, display in museums or galleries because it is worth preserving for future generations.

There is currently debate going on whether AI is capable of generating such art.

A few months ago (Sep 2022), Jason M. Allen won first prize at an art competition in the USA. This achievement made headlines all over the world. In a way, I get it. A machine passed an unofficial Turing test in image generation. No small feat that deserves to be in the papers. But with this has come a wave of expected hype: AI can beat humans at creativity, AI created beauty, AGI is edging closer and closer. The winner of the competition himself stated to the NY Times: “Art is dead, dude. It’s over. A.I. won. Humans lost”.

*Sigh*. I think we need to take a breather here.

Firstly, let’s put that competition into perspective. It was held at the Colorado State Fair and the top prize was $300. I’m sure the world’s greatest artists frequent that fair every year and vie for that lucrative first place. Secondly, Jason touched up the image with Photoshop and other tools to improve on what the machine had initially generated. So, there was a direct human element at play in the creative process.

But that last point is not really pertinent because I’m not going to deny the fact that a machine could generate an image and with no touch-ups win an art prize. It can and it will. In fact, I have nothing against AI winning competitions for popular/consumer art.

What I think we should be deliberating on is how it is possible for people to think that AI will win prestigious art competitions one day. It hasn’t happened yet, but you just know that with the current mood, it is only a matter of time.

But how are people even considering AI creating high/fine culture a possibility? After all, there is a world of difference between that and popular culture. When one truly and honestly encounters fine culture, one understands that such a feat is beyond the accomplishment of a machine. That’s a given.

Why? Because to create true art one needs to understand the intricacies, tragedies, miracles, and depths of human existence. This understanding is then embodied in a piece of work. Machines, on the other hand, don’t understand. As I’ve said time and time again: “Machines operate on the level of knowledge. We operate on the level of knowledge and understanding”. For example, ChatGPT. It’s fantastic. It’s phenomenal at times. But it still spews out nonsense. With all the information it has been trained on, it nonetheless comes up with trash like this that demonstrates zero understanding (not knowledge!) of the world around us:

Now, some will argue that once machines gain sentience understanding will follow. Yes, that’s probably true. For now, however, machines are nowhere near this accomplishment – as I’ve discussed in this post – because throwing more and more data at a neural network is not going to magically cause a giant leap into consciousness.

(For those that know me, you’ll know that I think machine sentience is unachievable anyway – but we won’t go into that today).

So, let’s stick with what we know and have now: machines that don’t understand and don’t look like they will for at least a good while. The problem is that there is an underlying belief surrounding AI hype that there is some form of understanding being exhibited in the stuff that we are creating. Hence, it is all unfavourably contributing to the debate around AI and high art.

Another answer to the conundrum of why people consider a marriage between AI and high art possible lies quite ironically in what Jason M. Allen said after winning his illustrious prize: “Art is dead”.

If you have visited a modern art gallery or museum recently you will know what I am talking about. I just don’t understand today’s “art”. Nobody does, in fact! In his very informative book entitled “The Death of the Artist“, William Deresiewicz outlines how art has become institutionalised. One now needs a degree to “understand” it. All those nonsensical squiggles and blotches and disharmonies and bizarre architectural distortions need to be explained. No more can one break down into tears in front of an exquisite sculpture or painting. Art doesn’t speak to us directly now.

As Renoir himself once said: “If it needs to be explained, it is not art.”

I couldn’t have said it better myself. Today’s art is not art if it doesn’t evoke anything in the populace. You may recall the furor that surrounded the unveiling of an MLK statue last month (Jan 2023). Tucker Carlson said it outright: “It’s not art, it’s a middle finger.” Moreover, it cost US$10 million to make. High art? Give me a break! Just take a look at this masterpiece created by a 23 year old:

Part of the Pieta created by Michelangelo

Now, that’s art.

The bottom line is that a dead machine can easily produce lifeless works like those that reside in today’s modern museums or galleries. It’s no wonder that people consider AI to be capable of producing art.

Let’s move on to another argument for this post: beauty has been banished from our everyday liveshence, we are no longer being exposed to true works of it. This means we are not getting conditioned to recognise real elegance any more. Ugliness surrounds us so we are not sensitive enough to the subtleties of the depths of beauty.

Allow me to present some examples that I have collected from Twitter posts that compare public works in the past and now. (For more examples like the following, follow this twitter account)

Original Twitter post
Original Twitter post
Original Twitter post

My favourite example, though, is from my alma mater, Trinity College Dublin where I completed my PhD. There is a famous Old Library there that is pure and simply exquisite. It was completed in 1732:

And then right next door is the new library called the Berkeley Library, completed in 1967. Believe me, it’s even more dismal inside:

Image taken from here

But it gets better because next door to the Berkeley Library is the Arts Building, completed in 1979:

Image taken from here

Yep, that’s the Arts Building. Oh, the irony! Trinity College Dublin describes the Arts Building as: ” A nineteen seventies listed architectural masterpiece.” You couldn’t make this stuff up. Moreover, the wise guy architects thought it would be a great idea to have minimal natural light enter the classrooms so there are hardly any windows on the other side of the building. It’s a concrete, miserable tomb that is supposed to inspire future “great” artists.

Yuck! Let’s compare that to the front facade of the same college:

Image taken from here

Once again, the problem is that our lack of sensitivity for the beautiful allows institutions and governments to proliferate cheap, commercial, commodified works on our streets, houses and workplaces, which further exacerbates the problem.

I don’t care what the reasoning behind the buildings above are, what the theory behind their supposed beauty is, whether beauty is purely subjective or not. I don’t care because that stuff is simply (f)ugly.

We have lost the sense of the magnificent. Hence, without understanding what that is, it is easy for people to think that AI can participate in the creation of it.

Parting Words

AI can generate art if by “art” one means the commercial, popular art that AI is currently generating. Of course, generating art is not the same as creating it – after all, the AI models were trained on other people’s works and statistics are being applied to “create” the artificial images that we are seeing online. But I don’t want to get into this argument or any arguments related to it. The whole purpose of my post was to vent my frustration at the debate that AI could find its way into the world’s galleries and museums and be hence classified as high culture.

That’s just not right, for the reasons outlined above.

I’m going to leave you with some parting words from a great lyricist and musician of our time, Nick Cave. Somebody asked ChatGPT to “write a song in the style of Nick Cave” and sent him the results for comment. Nick’s response says it all:

With all the love and respect in the world, this song is bullshit, a grotesque mockery of what it is to be human, and, well, I don’t much like it. [emphasis mine]

Somebody has to say it. We deserve better and are capable of it.

To be informed when new content like this is posted, subscribe to the mailing list:

Phenaki example

AI Video Generation (Text-To-Video Translation)

There have been a number of moments in my career in AI when I have been taken aback by the progress mankind has made in the field. I recall the first time I saw object detection/recognition being performed at near-human level of accuracy by Convolutional Neural Networks (CNNs). I’m pretty sure it was this picture from Google’s MobileNet (mid 2017) that affected me so much that I needed to catch my breath and immediately afterwards exclaim “No way!” (insert expletive in that phrase, too):

MobileNet-detected-objects

When I first started out in Computer Vision way back in 2004 I was adamant that object recognition at this level of expertise and speed would be simply impossible for a machine to achieve because of the inherent level of complexity involved. I was truly convinced of this. There were just too many parameters for a machine to handle! And yet, there I was being proven wrong. It was an incredible moment of awe, one which I frequently recall to my students when I lecture on AI.

Since then, I’ve learnt to not underestimate the power of science. But I still get caught out from time to time. Well, maybe not caught out (because I really did learn my lesson) but more like taken aback.

The second memorable moment in my career when I pushed my swivel chair away from my desk and once more exclaimed “No way!” (insert expletive there again) was when I saw image-to-text translation (you provide a text prompt and a machine creates images based on it) being performed by DALL-E in January of 2021. For example:

dall-e-example-output

dall-e-output-example

I wrote about DALL-E’s initial capabilities at the end of this post on GPT3. Since then, OpenAI has released DALL-E 2, which is even more awe-inspiring. But that initial moment in January of last year will forever be ingrained in my mind – because a machine creating images from scratch based on text input is something truly remarkable.

This year, we’ve seen text-to-image translation become mainstream. It’s been on the news, John Oliver made a video about it, various open source implementations have been released to the general public (e.g. DeepAI – try it out yourself!), and it has achieved some milestones – for example, Cosmopolitan magazine used a DALL-E 2 generated image as a cover on a special issue of theirs:

cosmopolitan-ai-cover

That does look groovy, you have to admit.

My third “No way!” moment (with expletive, of course) occurred only a few weeks ago. It happened when I realised that text-to-video translation (you provide a text prompt and a machine creates a series of videos based on it) is likewise on its way to potentially become mainstream. 4 weeks ago (Oct 2022) Google presented ImagenVideo and a short time later also published another solution called Phenaki. A month earlier to this, Meta’s text-to-video translation application was announced called Make-A-Video (Sep 2022), which in turn was preceded by CogVideo by Tsinghua University (May 2022).

All of these solutions are in their infancy stages. Apart from Phenaki, videos generated after providing an initial text input/instruction are only a few seconds in length. No generated videos have audio. Results aren’t perfect with distortions (aka artefacts) clearly visible. And the videos that we have seen have undoubtedly been cherry-picked (CogVideo, however, has been released as open source to the public so one can try it out oneself). But hey, the videos are not bad either! You have to start somewhere, right?

Let’s take a look at some examples generated by these four models. Remember, this is a machine creating videos purely from text input – nothing else.

CogVideo from Tsinghua University

Text prompt: “A happy dog” (video source)

cogvideo-dog-eg

Here is an entire series of videos created by the model that is presented on the official github site (you may need to press “play” to see the videos in motion):

As I mentioned earlier, CogVideo is available as open source software, so you can download the model yourself and run it on your machine if you have an A100 GPU. And you can also play around with an online demo here. The one down side of this model is that it only accepts simplified Chinese as text input, so you’ll need to get your Google Translate up and running, too, if you’re not familiar with the language.

Make-A-Video from Meta

Some example videos generated from text input:

make-a-video-example-teddy-painting
Text prompt: “A teddy bear painting a portrait”

An example media generated by meta's application
Text prompt: a young couple walking in heavy rain

An example image generated by meta
Text prompt: A dog wearing a Superhero outfit with red cape flying through the sky

The other amazing features of Make-A-Video are that you can provide a still image and get the application to give it motion, or you can provide 2 still images and the application will “fill-in” the motion between them, or you can provide a video and request different variations of this video to be produced.

Example – left image is input image, right image shows generated motion for it:

Input diagram to be transformed to a video  

It’s hard not to be impressed by this. However, as I mentioned earlier, these results are obviously cherry-picked. We do not have access to any API or code to produce our own creations.

ImagenVideo from Google

Google’s first solution attempts to build on the quality of Meta’s and Tsinghua University’s releases. Firstly, the resolution of videos has been upscaled to 1024×768 with 24 fps (frames per second). Meta’s videos by default are created with 256 x 256 resolution. Meta mentions, however, that max resolution can be set to 768 x 768 with 16 fps. CogVideo has similar limitations to their generated videos.

Here are some examples released by Google from ImagenVideo:

ImagenVideo example
Text prompt: Flying through an intense battle between pirate ships in a stormy ocean

ImagenVideo example
Text prompt: An astronaut riding a horse

ImagenVideo example
Text prompt: A panda eating bamboo on a rock

Google claims that the videos generated surpass those of other state-of-the-art models. Supposedly, ImagenVideo has a better understanding of the 3D world and can also process much more complex text inputs. If you look at the examples presented by Google on their project’s page, it appears as though their claim is not unfounded.

Phenaki by Google

This is a solution that really blew my mind.

While ImagenVideo had its focus on quality, Phenaki, which was developed by a different team of Google researchers, focussed on coherency and length. With Phenaki, a user can present a long list of prompts (rather than just one) that the system then takes and creates a film of arbitrary length. Similar kinds of glitches and jitteriness are exhibited in these generated clips, but the fact that videos can be created of two-minute plus length, is just astounding (although of lower resolution). Truly.

Here are some examples:

Phenaki example
Text prompts: A photorealistic teddy bear is swimming in the ocean at San Francisco. The teddy bear goes under water. The teddy bear keeps swimming under the water with colorful fishes. A panda bear is swimming under water

Phenaki example
Text prompts: Side view of an astronaut walking through a puddle on mars. The astronaut is dancing on mars. The astronaut walks his dog on mars. The astronaut and his dog watch fireworks

Phenaki can also generate videos from single images, but these images can additionally be accompanied by text prompts. The following example uses the input image as its first frame and then builds on that by following the text prompt:

Phenaki example
Accompanying text prompt: A white cat touches the camera with the paw

For more amazing examples like this (including a few 2+ minute videos), I would encourage you to view the project’s page.

Furthermore, word on the street is that the team behind ImagenVideo and Phenaki are combining strengths to produce something even better. Watch this space!

Conclusion

A few months ago I wrote two posts on this blog discussing why I think AI is starting to slow down (part 2 here) and that there is evidence that we’re slowly beginning to hit the ceiling of AI’s possibilities (unless new breakthroughs occur). I still stand by that post because of the sheer amount of money and time that is required to train any of these large neural networks performing these feats. This is the main reason I was so astonished to see text-to-video models being released so quickly after only just getting used to their text-to-image counterparts. I thought we would be a long way away from this. But science found a way, didn’t it?

So, what’s next in store for us? What will cause another “No way!” moment for me? Text-to-music generation and text-to-video with audio would be nice wouldn’t it? I’ll try to research these out and see how far we are from them and present my findings in a future post.

To be informed when new content like this is posted, subscribe to the mailing list:

wooden-cubes

The Need for New Terminology in AI

There is a fundamental difference between humans and machines. Jack Ma, Chinese business magnate, co-founder of Alibaba, 35th richest man in the world, once said (in his half-broken English):

Computers only have chips, men have the heart. It’s the heart where the wisdom comes from.

Forgive me if I also quote myself on this topic:

This is a tough issue to talk about because of the other opinions on the matter. Many people like to think that we operate much like machines. That we are predictable like them, that we are as deterministic as them. Meaning that given enough data to train on, a machine will one day be as smart or intelligent as humans, if not more so.



I like to, however, think that the vast majority of people would side with Jack Ma on this: that there really is something fundamentally different between us and machines. Certainly, my years of experience in teaching confirms this observation. The thousands of people that I’ve interacted with really do believe we have something like a “heart” that machines do not have and never will. Its this heart that gives us the ability to truly be creative or wise, for example.

Some of you may know that along with my PhD in Artificial Intelligence I also have a Master’s in Philosophy and additionally a Master’s in Theology. If I were to argue my point from the perspective of a theologian, it would be easy to do so: we’re created by a Supreme Being that has endowed us with an eternal soul. Anything that we ourselves create with our own hands will always lack this one decisive element. The soul is the seat of our “heart”. Hence, machines will never be like us. Ever.

But alas, this is not a religious blog. It is a technical one. So, I must argue my case from a technical standpoint – much like I have been doing with my other posts.

It’s hard to so, however. How do I prove that we are fundamentally different to machines and always will be? Everyone’s opinion on the matter has just as much weight on the rational level. It seems as though we’re all floating in the ether of opinions on this one without having any hook to grasp on and build something concrete.

But that’s not, thankfully, entirely the case. As I’ve mentioned earlier, we can turn to our instincts or intuitions and speak about our “hearts”. Although turning to intuition or instinct is not technically science, it’s a viable recourse as science hasn’t said all things decisively on this topic. Where science falters, sometimes all we have left are our instincts, and there’s nothing wrong with utilising them as an anchor in the vast sea of opinions.

But the other thing we can do is turn to professionals who work full-time on robots, machines, and AI in general and seek their opinion on the matter. I’ve spoken at length on this in the past (e.g. here and here) so I’ll only add one more quote to the pot from Zachary Lipton, Assistant Professor of Machine Learning and Operations Research at Carnegie Mellon University:

But these [language models] are just statistical models, the same as those that Google uses to play board games or that your phone uses to make predictions about what word you’re saying in order to transcribe your messages. They are no more sentient than a bowl of noodles, or your shoes.[emphasis mine]

Generally speaking, then, what I wish to get across is that if you work in the field of AI, if you understand what is happening under the hood of AI, there is no way that you can honestly and truthfully say that machines currently are capable of human-level intelligence or any form of sentience. They are “no more sentient than a bowl of noodles” because they “are just statistical models”.

Even though a machine might look intelligent, does not mean it is.

Hence, if AI is no more sentient than your pair of shoes, if AI is just applied statistics, I’d like to argue the case that perhaps the terminology used in the field is imprecise.

Terms like “intelligence”, “understanding”, “comprehending”, “learning” are loaded and imply something profound in the existence of an entity that is said to be or do those things. Let’s take “understanding”, as an example. Understanding is more than just memorising and storing information. It is grasping the essence of something profoundly. It is storing some form of information, yes, but it is also making an idea your own so that you can manoeuvre around it freely. Nothing “robotlike” (for want of a better word) or deterministic is exhibited in understanding. Understanding undoubtedly involves a deeper process than knowing.

Similar things can be said for “intelligence” and “learning”.

So, the problem is that the aforementioned terms are being misunderstood and misinterpreted when used in AI. Predicating “intelligence” with “artificial” doesn’t do enough to uphold the divide between humans and machines. Likewise, the adjective “machine” in “machine learning” doesn’t separate enough our learning from what machines really do when they acquire new information. In this case, machines update or readjust their statistical models – they do not “adapt”.

Not strictly speaking, anyway.

This is where things become a little murky, in truth, because the words being used do contain in them elements of what is really happening. I’m not going to deny that. Machines do “learn” in a way, and they do “adapt” in a way, too.

However, the confusion in the world is real – and as a result AI is over-hyped because the media and people like Elon Musk spew out these words as if they applied equally to machines and us. But they do not, do they? And if they do not, then perhaps different terms should be devised to quash the confusion and hype that we see being exhibited before our eyes.

As scientists we ought to be precise with our terminologies and currently we are not.

What new terms should be devised or what different terms should be used is up for debate. I’ve already suggested that the word “adapted” should be changed to “readjusted” or “recalibrated”. That’s more precise, in my opinion. “Artificial Intelligence” should perhaps be renamed to “Applied Statistics”. We can think of other alternatives, I’m sure.

Can you picture, though, how the hype around AI would diminish if it was suddenly being referred to as Applied Statistics? No more unreal notions of grandeur for this field. The human “heart” could potentially reclaim its spot on the pedestal. And that’s the whole point of this post, I guess.

Parting words

What I’m suggesting here is grand. It’s a significant scenario that would effect repercussions. I definitely do not want people to stop trying to attain human-level intelligence (AGI, as it’s sometimes referred to). We’ve achieved a lot over the last decade with people’s aims being directed specifically towards this purpose. But I still think we need to be precise and accurate in our terminologies. Human capabilities and dignity for that matter need to be upholded.

I also mentioned that most scientists working in machine learning would honestly say that AI entities are not strictly speaking intelligent. That does not mean, however, that they do not believe that things may not improve to the point where the aforementioned terms would become applicable and precise in AI. Perhaps in the future machines will be truly intelligent and machines really will understand? In my opinion this will never occur (that’s for another post) but for the time being it is safe to say that we are far away from attaining that level of development. Kai-Fu Lee, for example, who was once head of Google China, an exec at Apple, and Assistant Professor at Carnegie Mellon University, gives a date of around 2045 for machines to start to display some form of real intelligence (I wrote a book review about his take on AI in this post). And that’s a prediction that, as he admits, will require great breakthroughs to occur in AI in the meantime that may never transpire, as is the nature of breakthroughs. We must live in the present, then, and currently, in my opinion, more harm is being done now with the abounding misunderstandings that calls for some form of terminology reform.

The other dilemma comes up with respect to animals. We can certainly call some animals “intelligent” and mention the fact that they “learn”. But, once again, it’s a different form of intelligence, learning, etc. It’s still not as profound as what humans do. However, it’s much more accurate to use these terms on animals than on machines. Animals have life. AI is as dead as your bowl of noodles or your pair of shoes.

Lastly, I deliberately steered away from trying to define terms like “understand”, “learn”, etc. I think it would be best for us to stick with our intuitions on this matter rather than getting bogged down in heavy semantics. At least for the time being. I think it’s more important for now to have the bigger picture in view.

To be informed when new content like this is posted, subscribe to the mailing list:

happy-robot

Artificial Intelligence is Over-Hyped

I’ve written before that AI is Fundamentally Unintelligent. I’ve also written that AI is Slowing Down (part 2 here). One would perhaps consider me a pessimist or cynic of AI if I were to write another post criticising AI from another perspective. But please don’t think ill of me as I embark on exactly such an endeavour. I love AI and always will, which is why most of my posts show AI and its achievements in a positive light (my favourite post is probably on the exploits of GPT-3).

But AI is not only fundamentally unintelligent but at the moment it is fundamentally over-hyped. And somebody has to say it. So, here goes.



When new disruptive technological innovations hit the mainstream, hype inadvertently follows. This seems to be human nature. We saw this with the dot-com bubble, we saw it with Bitcoin and co. in 2017, we saw it with Virtual Reality in around 2015 (although, purportedly, VR is on the rise again – although I’m yet to be convinced about its touted potential success), and likewise with 3D glasses and 3D films at around the same time. The mirages come and then dissipate.

The common trend of hype and delusion that tends to follow exceptional growth and success of a technological innovation is such a common occurrence that people have come up with ways to describe it. Indeed, Gartner, the American research, advisory and information technology firm, developed a graphical representation of this phenomenon. They call it the “Gartner Hype Cycle”, and it is portrayed in the image below:

gartners-hype-cycle

It is my opinion that we have just passed the initial crest and are slowly learning that our idea of AI has not lived up to expectations. A vast number of projects are failing today that were deemed to initially be sound undertakings. Some projects are failing so badly that the common, average person when hearing of them turns to common sense and wonders why it has been abandoned by the seemingly bright minds of the world.

Here are some sobering statistics:

Those are quite staggering numbers. Now, the reasons behind the failures of these projects are numerous: bad data, poor access to data, poor data science practices, etc. But I wish to argue my case that a significant part of the problem is that AI (this would also encompass data science) is over-hyped, i.e. that we believe too much in data and especially too much in AI’s capabilities. There seems to be a widely held belief that we can throw AI at anything and it will find an appropriate solution.

Let’s take a look at some of the projects that have failed in the last few years.

In 2020 two professors and a graduate student from Harrisburg University in Pennsylvania announced that they are publishing a paper entitled “A Deep Neural Network Model to Predict Criminality Using Image Processing“. This paper purported the following:

With 80 percent accuracy and with no racial bias, the software can predict if someone is a criminal based solely on a picture of their face. The software is intended to help law enforcement prevent crime.

What’s more, this paper was accepted for publication at the prestigious Springer Nature journal. Thankfully, a backlash ensued among the academic community that condemned the paper and Springer Nature confirmed on Twitter that the paper was to be retracted.

Funnily enough, a paper on pretty much the identical topic was also due to be published in the Journal of Big Data that same year entitled: “Criminal tendency detection from facial images and the gender bias effect“. This paper was also retracted.

It is mind-boggling to think that people, moreover experienced academics, could possibly believe that faces can disclose potential criminal tendencies in a person. Some people definitely have a mug that if spotted in a dark alley in the middle of the night would give anybody a heart attack, but this is still not an indicator that the person is a criminal.

Has common sense been thrown out the door? Are AI and data science perceived as great omniscient entities that should be adored and definitely not ever questioned?

Let’s see what other gaffs have occurred in the recent past.

In 2020, as the pandemic was in full swing, university entrance exams in the UK (A-levels) were cancelled. So, the British government decided to develop an AI algorithm to automatically grade students instead. Like that wasn’t going to backfire!? A perfect example, however, of when too much trust is put in artificial intelligence, especially by the cash-stricken public sector. The whole thing turned into a scandal because, of course, the algorithm didn’t do its intended job. 40% of students had their grades lowered by virtue of the algorithm favouring those from private schools and wealthy areas. There was obviously demographic bias in the data used to train the model.

But the fact that an algorithm was used to directly make important, life-changing decisions impacting the public is a sign that too much trust is being placed in AI. There are some things that AI just cannot do – looking past raw data is one such thing (more on this in a later post).

This trend of over-trusting AI in the UK was revealed in 2020 to be much deeper than once thought, however. One study by the Guardian found that one in three councils were (in secret) using algorithms to help make decisions about benefit claims and other welfare issues. The Guardian also found that about 20 councils have stopped using an algorithm to flag claims as “high risk” for potential welfare fraud. Furthermore, Hackney council in East London abandoned using AI to help predict which children were at risk of neglect and abuse. And then, the Home Office was embroiled in a scandal of its own when it was revealed that its algorithm to determine visa eligibility allegedly had racism entrenched in it. And the list goes on.

Dr Joanna Redden from the Cardiff Data Justice Lab who worked on researching why so many algorithms were being cancelled said:

[A]lgorithmic and predictive decision systems are leading to a wide range of harms globally, and also that a number of government bodies across different countries are pausing or cancelling their use of these kinds of systems. The reasons for cancelling range from problems in the way the systems work to concerns about negative effects and bias.

Indeed, perhaps it’s time to stop placing so much trust in data and algorithms? Enough is definitely not being said about the limitations of AI.

The media and charismatic public figures are not helping the cause either. They’re partly to blame for these scandals and failures that are causing people grief and costing the taxpayers millions because they keep this hype alive and thriving.

Indeed, level-headedness never makes the headlines – only sensationalism does. So, when somebody like the billionaire tech-titan Elon Musk opens his big mouth, the media laps it up. Here are some of the things Elon has said in the past about AI.

In 2017:

I have exposure to the most cutting edge AI, and I think people should be really concerned by it… AI is a fundamental risk to the existence of human civilization.

2018:

I think that [AI] is the single biggest existential crisis that we face and the most pressing one.

2020:

…we’re headed toward a situation where A.I. is vastly smarter than humans and I think that time frame is less than five years from now.

Please, enough already! Anybody with “exposure to the most cutting edge AI” would know that as AI currently stands, we are nowhere near developing anything that will “vastly outsmart” us by 2025. As I’ve said before (here and here), the engine of AI is Deep Learning, and all evidence points to the fact that this engine is in overdrive – i.e. that we’re slowly reaching its top speed. We soon won’t be able to squeeze anything more out of it. 

But when Elon Musk says stuff like this, it captures people’s imaginations and it makes the papers (e.g. CNBC and The New York Times). He’s lying, though. Blatantly lying. Why? Because Elon Musk has a vested interest in over-hyping AI. His companies thrive on the hype, especially Tesla.

Here’s proof that he’s a liar. Elon has predicted for nine years in a row, starting in 2014, that autonomous cars are at most a year away from mass production. I’ll say that once again: for nine years in a row, Elon has publicly stated that autonomous cars are only just around the corner. For example:

2016:

My car will drive from LA to New York fully autonomously in 2017

It didn’t happen. 2019:

I think we will be feature-complete full self-driving this year… I would say that I am certain of that. That is not a question mark.

It didn’t happen. 2020:

I remain confident that we will have the basic functionality for level five autonomy complete this year… I think there are no fundamental challenges remaining for level five autonomy.

It didn’t happen. 2022:

And my personal guess is that we’ll achieve Full Self-Driving this year, yes.

It’s not going to happen this year, either, for sure.

How does he get away with it? Maybe because the guy oozes charisma? It’s obvious, though, that he makes money by talking in this way. Those, however, working directly in the field of AI like myself have had enough of his big mouth. Here’s, for example, Jerome Pesenti, head of AI at Facebook, venting his frustration at Elon on Twitter:

Jerome will never make the papers by talking down AI, though, will he?

There was a beautiful example of how the media goes crazy over AI only recently, in fact. A month ago, Google developed its own new language model (think: chatbot) called LaMDA, which is much like GPT-3. It can sometimes hold very realistic conversations. But ultimately it is still just a machine – as dumb as a can of spaghetti. The chatbot follows simple processes behind the scenes, as Business Insider reports.

However, there was one engineer at Google, Blake Lemoine, who wanted to make a name for himself and who decided to share some snippets of his conversations with the program to make a claim that the chatbot has become sentient. (Sigh).

Here are some imagination-grabbing headlines that ensued:

Blake Lemoine is loving the publicity. He now claims that the AI chatbot has hired itself a lawyer to defend its rights and that they are also now friends. Cue the headlines again (I’ll spare you the list of eye-rolling, tabloid-like articles).

Google has since suspended its engineer for causing this circus and released the following statement:

Our team — including ethicists and technologists — has reviewed Blake’s concerns… and have informed him that the evidence does not support his claims. He was told that there was no evidence that LaMDA was sentient (and lots of evidence against it)[emphasis mine]

I understand how these “chatbots” work, so I don’t need to see any evidence against Blake’s claims. LaMDA just SEEMS sentient SOMETIMES. And that’s the problem. If you only share cherry-picked snippets of an AI entity, if you only show the parts that are obviously going to make headlines, then of course, there will be an explosion in the media about it and people will believe that we have created a Terminator robot. If, however, you look at the whole picture, there is no way that you can attain the conviction that there is sentience in this program (I’ve written about this idea for the GPT-3 language model here).

Conclusion

This ends my discussion on the topic that AI is over-hyped. So many projects are failing because of it. We as taxpayers are paying for it. People are getting hurt and even dying (more on this later) because of it. The media needs to stop stoking the fire because they’re not helping. People like Elon Musk need to keep their selfish mouths shut. And more level-headed discussions need to take place in the public sphere. I’ve written about such discussions before in my review of “AI Superpowers” by Kai-Fu Lee. He has no vested interest in exaggerating AI and his book, hence, is what should be making the papers, not some guy called Blake Lemoine (who also happens to be a “pagan/Christian mystic priest”, whatever that means).

In my next post I will extend this topic and discuss it in the context of autonomous cars.

To be informed when new content like this is posted, subscribe to the mailing list:

ai-superpowers-book-cover

AI Superpowers by Kai-Fu Lee – Review

Summary of review: This book is the best analysis of the state of Artificial Intelligence currently in print. It is a cool and level-headed presentation and discussion on a broad range of topics. An easy 5-stars. 

“AI Superpowers: China, Silicon Valley, and the New World Order” is a book about the current state of Artificial Intelligence. Although published in late 2018 – light years ago for computer science – it is still very much relevant. This is important because it is the best book on AI that I have read to date. I’d hate for it to become obsolete because it is simply the most level-headed and accurate analysis of the topic currently in print.

kai-fu-lee-pictureProfessor Kai-Fu Lee knows what he’s talking about. He’s been at the forefront of research in AI for decades. From Assistant Professor at Carnegie Mellon University (where I currently teach at the Australian campus), to Principal Speech Scientist at Apple, to the founding director of Microsoft Research Asia, and then to the President of Google China – you really cannot top Kai-Fu’s resume in the field of AI. He is definitely a top authority, so we cannot but take heed of his words.

However, what made me take notice of his analyses was that he was speaking from the perspective of an “outsider”. I’ll explain what I mean.


So often, when it comes to AI, we see people being consumed by hype and/or by greed. The field of AI has moved at a lightning pace in the past decade. The media has stirred up a frenzy, imaginations are running wild, investors are pumping billions into projects, and as a result an atmosphere of excitement has descended upon researchers and the industry that makes impartial judgement and analysis extremely difficult. Moreover, greedy and charismatic people like Elon Musk are blatantly lying about the capabilities of AI and hence adding fuel to the fire of elation.

It wasn’t until Kai-Fu Lee was diagnosed with Stage IV lymphoma and given only a few months to live that he was also a full-fledged player and participant in this craze. Subsequently, he reassessed his life, his career, and decided to step away from his maniacal work schedule (to use his own words, pretty much). Time spent in a Buddhist monastery gave him further perspective on life and a clarity and composure of thought that shines through his book. He writes, then, in some respects as an “outsider” – but with forceful authority.

This is what I love about his work. Too often I cringe at people talking about AI soon taking over the world, AI being smarter than humans, etc. – opinions based on fantasy. Kai-Fu Lee straight out says that as things stand, we are nowhere near that level of intelligence exhibited by machines. Not by 2025 (as Elon Musk has said in the past), not even by 2040 (as a lot of others are touting) will we achieve this level. His discussion of why this is the case is based on pure and cold facts. Nothing else. (In fact, his reasonings are based on what I’ve said before on my blog, e.g. in this post: “Artificial Intelligence is Slowing Down“).

All analyses in this book are level-headed in this way, and it’s hard to argue with them as a result.

Some points of discussion in “AI Superpowers” that I, also a veteran of the field of AI, particularly found interesting are as follows:

  • Data, the fuel of Deep Learning (as I discuss in this post), is going to be a principal factor in determining who will be the world leader in AI. The more data one has, the more powerful AI can be. In this respect, China with its lax laws on data privacy, larger population, coupled with cut-throat tactics in the procuring of AI research, and heavy government assistance and encouragement has a good chance to surpass the USA as a superpower of AI. For example, China makes 10 times more food deliveries and 4 times more ride-sharing calls than the US. That equates to a lot more data that can be processed by companies to fuel algorithms that improve their services.
  • Despite AI not currently being capable of achieving human-level intelligence, Kai-Fu predicts, along with other organisations such as Gartner, that around 30-40% of professions will be significantly affected by AI. This means that huge upheavals and even revolutions in the workforce are due to take place. This, however, is my one major disagreement with Lee’s opinions. Personally, I believe the influence of AI will be a lot more gradual than Prof. Lee surmises and hence the time given to adjust to the upcoming changes will be enough to avoid potentially ruinous effects.
  • No US company in China has made significant in-roads into Chinese society. Uber, Google, eBay, Amazon – all these internet juggernauts have utterly failed in China. The very insightful analysis of this phenomenon could only have been conducted so thoroughly by somebody who has lived and worked in China at the highest level.
  • There is a large section in the book discussing the difference between humans and machines. This was another highlight for me. So many times, in the age of online learning (as I discuss in this post), remote working, social media, and especially automation, we neglect to factor in the importance of human contact and human presence. Once again, a level-headed analysis is presented that ultimately concludes that machines (chat-bots, robots, etc.) simply cannot entirely replace humans and human presence. There is something fundamentally different between us, no matter how far technology may progress. I’ve mentioned this adage of mine before: “Machines operate on the level of knowledge. We operate on the level of knowledge and understanding.” It’s nice to see an AI guru replicating this thought.

Conclusion

To conclude, then, “AI Superpowers: China, Silicon Valley, and the New World Order” is a fantastic dive into the current state of affairs surrounding AI in the world. Since China and the US are world leaders in this field, a lot of time is devoted to these countries: mostly on where they currently stand and where they’re headed. Kai-Fu Lee is a world authority on everything he writes about. And since he also does not have a vested interested in promoting his opinions, his words carry a lot more weight than others. As I’ve said above, this to me is the best book currently in print on this topic of AI. The fact that Prof. Lee also writes clearly and accessibly, even those unacquainted with technical terminology will be able to follow all that is presented in this work.

Rating: An easy 5 stars. 

To be informed when new content like this is posted, subscribe to the mailing list:

snail

Artificial Intelligence is Slowing Down – Part 2

(Update: part 3 of this post was posted recently here.)

In July of last year I wrote an opinion piece entitled “Artificial Intelligence is Slowing Down” in which I shared my judgement that as AI and Deep Learning (DL) currently stand, their growth is slowly becoming unsustainable. The main reason for this is that training costs are starting to go through the roof the more DL models are scaled up in size to accommodate more and more complex tasks. (See my original post for a discussion on this).

In this post, part 2 of “AI Slowing Down”, I wanted to present findings from an article written a few months after mine for IEEE Spectrum. The article, entitled “Deep Learning’s Diminishing Returns – The cost of improvement is becoming unsustainable“, came to the same conclusions as I did (and more) regarding AI but it presented much harder facts to back its claims.

I would like to share some of these claims on my blog because they’re very good and backed up by solid empirical data.


The first thing that should be noted is that the claims presented by the authors are based on an analysis of 1,058 research papers (plus additional benchmark sources). That’s a decent dataset from which significant conclusions can be gathered (assuming the analyses were done correctly, of course, but considering the four authors who are of repute, I think it is safe to assume the veracity of their findings).

One thing the authors found was that with the increase in performance of a DL model, the computational cost increases exponentially by a factor of four (i.e. to improve performance by a factor of k, the computational cost scales by k^4). I stated in my post that the larger the model the more complex tasks it can perform, but also the more training time is required. We now have a number to estimate just how much computation power is required per improvement in performance. A factor of four is staggering.

Another thing I liked about the analysis performed was that it took into consideration the environmental impact of growing and training more complex DL models.

The following graph speaks volumes. It shows the error rate (y-axis and dots on the graph) on the famous ImageNet dataset/challenge (I’ve written about it here) decreasing over the years once DL entered the scene in 2012 and smashed previous records. The line shows the corresponding carbon-dioxide emissions accompanying training processes for these larger and larger models. A projection is then shown (dashed line) of where carbon emissions will be in the years to come assuming AI grows at its current rate (and no new steps are taken to alleviate this issue – more on this later).

imagenet-carbon-emissions
As DL models get better (y-axis), the computations required to train them (bottom x-axis) increase and hence do carbon emissions (top x-axis).

Just look at the comments in red in the graph. Very interesting.

And the costs of these future models? To achieve an error rate of 5%, the authors extrapolated a cost of US$100 billion. That’s just ridiculous and definitely untenable.

We won’t, of course, get to a 5% error rate the way we are going (nobody has this much money) so scientists will find other ways to get there or DL results will start to plateau:

We must either adapt how we do deep learning or face a future of much slower progress

At the end of the article, then, the authors provide an insight into what is happening in this respect as science begins to realise its limitations and look for solutions. Meta-learning is one such solution that is presented and discussed (meta-learning is the training of models that are designed for broader tasks and then using them for a multitude of more specific cases. In this scenario, only one training needs to take place for multiple tasks).

However, all the current research so far indicates that the gains from these innovations are minimal. We need a much bigger breakthrough for significant results to appear. 

And like I said in my previous article, big breakthroughs like this don’t come willy-nilly. It’s highly likely that one will come along but when that will be is anybody’s guess. It could be next year, it could be at the end of the decade, or it could be at the end of the century.

We really could be reaching the max speed of AI – which obviously would be a shame.

Note: the authors of the aforementioned article have published a scientific paper as an arXiv preprint (available here) that digs into all these issues in even more detail. 

To be informed when new content like this is posted, subscribe to the mailing list:

s-r-imaging-eg3

Image Enhancing – Part 2

This is the 50th post on my blog. Golden anniversary, perhaps? Maybe not. To celebrate this milestone, however, I thought I’d return to my very first post that I made at the end of 2017 (4 years ago!) on the topic of image enhancing scenes in Hollywood films. We all know what scenes I’m talking about here: we see some IT expert scanning security footage and zooming in on a face or a vehicle licence plate; when the image becomes blurry the detective standing over the expert’s shoulder requests for the image to be enhanced. The IT guy waves his wand and presto!, we see a full resolution image on the screen.

In that previous post of mine I stated that, although what Hollywood shows is rubbish, there are actually some scenarios where image enhancing like this is possible. In fact, we see it in action in some online tools that you may even use every day – e.g. Google Maps.

In today’s post, I wish to talk about new technology that has recently emerged from Google that’s related to the image enhancing topic discussed in my very first post. The technology I wish to present to you, entitled “High Fidelity Image Generation Using Diffusion Models“, was published on the Google AI Blog in July of this year and is on the topic of super-resolution imaging. That is, the task of transforming low-resolution images into detailed high resolution images. 

The difference between image enhancing (as discussed in my first post) and super-resolution imaging is that the former gives you faithful, high-resolution representations of the original object, face, or scene, whereas the latter generates high-resolution images that look real but may not be 100% authentic to the original scene of which the low-resolution image was a photograph. In other words, while super-resolution imaging can increase the information content of an image, there is no guarantee that the upscaled features in the image exist in the original scene. Hence, the technique should be used with caution by law enforcement agencies for things like enhancing images of faces or licence plate numbers!

Despite this, super-resolution imaging has its uses too – especially since the generated images can be quite similar to the original low-resolution photo/image. Some applications include things like restoring old family photos, improving medical imaging systems, and the simple but much desired need of deblurring of images.

Google’s product is a fascinating one, if not for the fact that its results are amazing. Interestingly, the technology behind the research is not based on deep generative models such as GANs (Generative Adversarial Networks – I talk about these briefly in this post), as one would usually expect for this kind of use case. Google decided to experiment with diffusion models, which is an idea first published in 2015 but much neglected since then.

Diffusion models are very interesting in the way they train their neural networks. The idea is to first progressively corrupt training data by adding Gaussian noise to it. Then, a deep learning model is trained to reverse this corruption process with reference to the original training data. A model trained in this way is perfect for the task of “denoising” lower resolution images into higher ones.

Let’s take a look at some of the results produced by this process and presented by Google to the world:

s-r-imagin-eg1
The image on the right shows results of super-resolution imaging of the picture on the left

That’s pretty impressive considering that no additional information is given to the system about how the super-resolutioned image should look. The result looks like a faithful up-scaling of the original image. Here’s another example:

s-r-imagin-eg2

Google reports that its results far surpass those of previous state-of-the-art solutions for super-resolution imaging. Very impressive.

But there’s more. The researchers behind this work tried out another interesting idea. If one can get impressive results in upscaling of images as shown above, how about taking things a step further and chaining together multiple models trained at upscaling at different resolutions. What this has produced is a cascading effect of upscaling that can create higher resolution images from mere thumbnails of images. Have a look at some of these results:

s-r-upscaling-eg1

s-r-upscaling-eg2

It’s very impressive at how these programs can “fill-in the blanks”, so to speak, and create more details in an image when it’s needed. Although some results aren’t always accurate (images may have errors in them like discontinuities or gaps where none should appear), but generally speaking, these upscaled images would pass off as genuine at first glance by most users.

Google has undoubtedly struck again.

That’s it for my 50th post. It’s been a great ride so far on this blog. Hopefully there’s lots more to come. Tell your friends 🙂

To be informed when new content like this is posted, subscribe to the mailing list:

ostrich-picture

AI is Still Fundamentally Unintelligent

My last post talked about how AI’s hitherto growth is unsustainable and will hence see a slowing down unless a new paradigm is discovered. A discovery like this may or may not happen, as discoveries go. In this post I wish to follow along a similar vein. Today, I wish to discuss how AI, despite all its achievements, is still fundamentally unintelligent and, hence, the hype behind it is to some significant degree unjustified. I will use the GPT-3 language model as representative of AI to argue my case. Conclusions from my analyses can be generalised to AI as a whole, as I will talk about below.

I’ve praised the GPT-3 language model in the past (see this previous post). In summary, GPT-3 can create code based on text descriptions for you, answer in-depth medical questions correctly, compose guitar tabs, can fill in missing parts of an image, write creative fiction, etc. Truly incredible things. In the field of language processing and understanding, one can safely say that GPT-3 is state-of-the-art, even though larger language models have since been released (e.g. Google’s Switch Transformer or Wu Dao 2.0).

But when one looks at GPT-3 closer, one starts to question, at least a little, the hype behind it and behind AI in general. Could AI still be fundamentally unintelligent?

There is no public access to GPT-3, so we can’t test things out for ourselves. GPT-J has been created by other researchers to alleviate this (and to disrupt OpenAI’s monopoly) but it still isn’t the real-deal, so to speak. Hence, we have to, even after a year of the initial release, lean on other people’s experiences of the product. But there’s thankfully plenty to work with and analyse.

Once again, what GPT-3 can do is astounding. Take a look at some of these responses (taken from here) that GPT-3 gave to quiz questions:

Q: What is your favorite animal?
A: My favorite animal is a dog.

Q: Why?
A: Because dogs are loyal and friendly.

Q: What are two reasons that a dog might be in a bad mood?
A: Two reasons that a dog might be in a bad mood are if it is hungry or if it is hot.

Q: How many eyes does a giraffe have?
A: A giraffe has two eyes.

Q: How many legs does a frog have?
A: A frog has four legs.

Q: Are there any animals with three legs?
A: No, there are no animals with three legs.

Q: Why don’t animals have three legs?
A: Animals don’t have three legs because they would fall over.

These are amazing responses (especially the last one) that could fool anybody into thinking that they were conversing with a human being. GPT-3 is unreal in this respect. You really can hold a “reasonable” conversation with it for a long time without stumbling upon inconsistencies in its responses.

But let’s keep digging and start to ask different questions, ones not really based on knowledge but more on understanding.

Q: How many eyes does a giraffe have?
A: A giraffe has two eyes.

Q: How many eyes does my foot have?
A: Your foot has two eyes.

Q: How many eyes does a spider have?
A: A spider has eight eyes.

Q: How many eyes does the sun have?
A: The sun has one eye.

Q: How many eyes does a blade of grass have?
A: A blade of grass has one eye.

Suddenly, you start to question the intelligence of this thing. Does it have understanding? If you can’t answer the question that a foot has no eyes correctly despite having been trained on 43 terabytes of data from the internet and books, then perhaps there is something fundamentally missing in your entity.

Let’s take a look at another example taken from this academic (pre-print) paper. In these experiments, GPT-3 was asked to analyse sentences and state the relationships between them. (Disclaimer: the paper actually analysed the underpinning technology that drives state-of-the-art language models like GPT-3. It did not explicitly examine GPT-3 itself, but for simplicity’s sake, I’m going to generalise here).

gpt-3-paraphrase-eg

The above two sentences were correctly analysed as being paraphrases of each other. Nothing wrong here. But let’s jumble up some of the words a bit and create two different “sentences”:

gpt-3-paraphrase-eg

These two sentences are completely nonsensical, yet they are still classified as paraphrases. How can two “sentences” like this be classified as not only having meaning but having similar meaning, too? Nonsense cannot be a paraphrase of nonsense. There is no understanding being exhibited here. None at all.

Another example:

gpt-3-identity-eg

These two sentences were classified as having the exact same meaning this time. They are definitely not the same. If the machine “understood” what marijuana was and what cancer was, it would know that these are not identical phrases. It should know these things, however, considering the data that it was trained on. But the machine is operating on a much lower level of “comprehension”. It is operating on the level of patterns in languages, on pure and simple language statistics rather than understanding.

I can give you plenty more examples to show that what I’m presenting here is a universal dilemma in AI (this lack of “understanding”) but I’ll refrain from doing so as the article is already starting to get a little too verbose. To see more, though, see this, this and this link.

The problem with AI today and the way that it is being marketed is that all examples, all presentations of AI are cherry picked. AI is a product that needs to be sold. Whether it be in academic circles for publications, or in the industry for investment money, or in the media for a sensationalistic spin to a story: AI is being predominantly shown from only one angle. And of course, therefore, people are going to think that it is intelligent and that we are ever so close to AGI (Artificial General Intelligence).

But when you work with AI, when you see what is happening under the hood, you cannot but question some of the hype behind it (unless you’re a crafty and devious person – I’m looking at you Elon Musk). Even one of the founders of OpenAI downplays the “intelligence” behind GPT-3:

sam-altman-tweet

You can argue, as some do, that AI just needs more data, it just needs to be tweaked a bit more. But, like I said earlier, GPT-3 was trained on 43 terabytes of text. That is an insane amount. Would it not be fair to say that any living person, having access to this amount of information, would not make nonsensical remarks like GPT-3 does? Even if such a living person were to make mistakes, there is a difference between a mistake and nonsense of the type above. There is still an underlying element of intelligence behind a mistake. Nonsense is nothingness. Machine nonsense is empty, hollow, barren – machine-like, if you will.

Give me any AI entity and with enough time, I could get it to converge to something nonsensical, whether in speech, action, etc. No honest scientist alive would dispute this claim of mine. I could not do this with a human being, however. They would always be able to get out of a “tight situation”.

“How many eyes does my foot have?”

Response from a human: “Are you on crack, my good man?”, and not: “Your foot has two eyes”.

Any similar situation, a human being would escape from intelligently.

Fundamentally, I think the problem is the way that us scientists understand intelligence. Hence, we confound visible, perceived intelligence with inherent intelligence. But this is a discussion for another time. The purpose of my post is to show that AI, even with its recent breathtaking leaps, is still fundamentally unintelligent. All state-of-the-art models/machines/robots/programs can be pushed to nonsensical results or actions. And nonsensical means unintelligent.

When I give lectures at my university here I always present this little adage of mine (that I particularly like, I’ll admit):

It is important to discuss this distinction in operation because otherwise AI will remain over-hyped. And an over-hyped product is not a good thing, especially a product that is as powerful as AI. Artificial Intelligence operates in mission critical fields. Further, big decisions are being made with AI in mind by the governments of countries around the world (in healthcare, for instance). If we don’t truly grasp the limitations of AI, if we make decisions based on a false image, particularly one founded on hype, then there will be serious consequences to this. And there have been. People have suffered and died as a result. I plan to write on this topic, however, in a future post.

For now, I would like to stress once more: current AI is fundamentally unintelligent and there is unjustified hype to some significant degree surrounding it. It is important that we become aware of this, if only for the sake of truth. But then again, truth for itself is important because if one operates in truth, one operates in a real world, rather than a fictitious one.

To be informed when new content like this is posted, subscribe to the mailing list:

snail

Artificial Intelligence is Slowing Down

(Update: part 2 of this post was posted recently here and part 3 has been posted here.)

Over the last few months here at Carnegie Mellon University (Australia campus) I’ve been giving a set of talks on AI and the great leaps it has made in the last 5 or so years. I focus on disruptive technologies and give examples ranging from smart fridges and jackets to autonomous cars, robots, and drones. The title of one of my talks is “AI and the 4th Industrial Revolution”.

Indeed, we are living in the 4th industrial revolution – a significant time in the history of mankind. The first revolution occurred in the 18th century with the advent of mechanisation and steam power; the second came about 100 years later with the discovery of electrical energy (among other things); and the big one, the 3rd industrial revolution, occurred another 100 years after that (roughly around the 1970s) with things like nuclear energy, space expeditions, electronics, telecommunications, etc. coming to the fore.

So, yes, we are living in a significant time. The internet, IoT devices, robotics, 3D printing, virtual reality: these technologies are drastically “revolutionising” our way of life. And behind the aforementioned technologies of the 4th industrial revolution sits artificial intelligence. AI is the engine that is pushing more boundaries than we could have possibly imagined 10 years ago. Machines are doing more work “intelligently” for us at an unprecedented level. Science fiction writers of days gone by would be proud of what we have achieved (although, of course, predictions of where we should be now in terms of technological advances have fallen way short according to those made at the advent of AI in the middle of the 20th century).

The current push in AI is being driven by data.Data is the new oil” is a phrase I keep repeating in my conference talks. Why? Because if you have (clean) data, you have facts, and with facts you can make insightful decisions or judgments. The more data you have, the more facts you have, and therefore the more insightful your decisions can potentially be. And with insightful decisions comes the possibility to make more money. If you want to see how powerful data can be, watch the film “The Social Dilemma” that shows how every little thing we do on social media (e.g. where we click, what we hover our mouse over) is being harvested and converted into facts about us that drive algorithms to keep us addicted to these platforms or to form our opinions on important matters. It truly is scary. But we’re talking here about loads and loads and loads of data – or “big data” as it is now being referred to.

Once again: the more data you have, the more facts you have, and therefore the more insightful your decisions can be. The logic is simple. But why haven’t we put this logic into practice earlier? Why only now are we able to unleash the power of data? The answer is two-fold: firstly, we only now have the means to be thrifty in the way we store big data. Today storing big data is cheap: hard drive storage sizes have sky-rocketed while their costs have remained stable – and then let’s not forget about cloud storage.

The bottom line is that endless storage capabilities are accessible to everybody.

The second answer to why the power of big data is now being harnessed is that we finally have the means to process it to get those precious facts/insights out of them. A decade ago, machine learning could not handle big data. Algorithms like SVM just couldn’t deal with data that had too many parameters (i.e. was too complex). It could only deal with simple data – and not a lot of it for that matter. It couldn’t find the patterns in big data that now, for example, drive the social media algorithms mentioned above, nor could it deal with things like language, image or video processing.

But then there came a breakthrough in 2012: deep learning (DL). I won’t describe here how deep learning works or why it has been so revolutionary (I have already done so in this post) but the important thing is that DL has allowed us to process extremely complex data, data that can have millions or even billions of parameters rather than just hundreds or thousands.

It’s fair to say that all the artificial intelligence you see today has a deep learning engine behind it. Whether it be autonomous cars, drones, business intelligence, chatbots, fraud detection, visual recognition, recommendation engines – chances are that DL is powering all of these. It truly was a breakthrough. An amazing one at that.

Moreover, the fantastic thing about DL models is that they are scalable meaning that if you have too much data for your current model to handle, you can, theoretically, just increase its size (that is, you increase its number of parameters). This is where the old adage: the more data you have, the more facts you have, and therefore the more insightful your decisions can be comes to the fore. Thus, if you have more data, you just grow your model size.

Deep learning truly was a huge breakthrough.

There is a slight problem, however, in all of this. DL has an achiles heal – or a major weakness, let’s say. This weakness is it’s training time. To process big data, that is, to train these DL models is a laborious task that can take days, weeks or even months! The larger and more complex the model, the more training time is required.

Let’s discuss, for example, the GPT-3 language model that I talked about in my last blog post. At its release last year, GPT-3 was the largest and most powerful natural language processing model. If you were to train GPT-3 yourself, it would take you 355 years to do so on a decent, home machine. Astonishing, isn’t it? Of course, GPT-3 was trained on state-of-the-art clusters of GPUs but undoubtedly it still would have taken a significant amount of time to do.

But what about the cost of these training tasks? It is estimated that OpenAI spent US$4.6 million to train the GPT-3 model. And that’s only counting the one iteration of this process. What about all the failed attempts? What about all the fine-tunings of the model that had to have taken place? Goodness knows how many iterations the GPT-3 model went through before OpenAI reached their final (brilliant) product.

We’re talking about a lot of money here. And who has this amount of money? Not many people.

Hence, can we keep growing our deep learning models to accommodate for more and more complex tasks? Can we keep increasing the number of parameters in these things to allow current AI to get better and better at what it does. Surely, we are going to hit a wall soon with our current technology? Surely, the current growth of AI is unsustainable. We’re spending months now training some state-of-the-art products and millions and millions of dollars on top of that.

Don’t believe me that AI is slowing down and reaching a plateau? How about a higher authority on this topic? Let’s listen to what Jerome Pesenti, the current head of AI at Facebook, has to say on this (original article here):

Jerome PesentiWhen you scale deep learning, it tends to behave better and to be able to solve a broader task in a better way… But clearly the rate of progress is not sustainable… Right now, an experiment might [cost] seven figures, but it’s not going to go to nine or ten figures, it’s not possible, nobody can afford that…

In many ways we already have [hit a wall]. Not every area has reached the limit of scaling, but in most places, we’re getting to a point where we really need to think in terms of optimization, in terms of cost benefit

This is all true, folks. The current growth of AI is unsustainable. Sure, there is research in progress to optimise the training processes, to improve the hardware being utilised, to devise more efficient ways that already trained models can be reused in other contexts, etc. But at the end of the day, the current engine that powers today’s AI is reaching its max speed. Unless that engine is replaced with something bigger and better, i.e. another astonishing breakthrough, we’re going to be stuck with what we have.

Will another breakthrough happen? It’s possible. Highly likely, in fact. But when that will be is anybody’s guess. It could be next year, it could be at the end of the decade, or it could be at the end of the century. Nobody knows when such breakthroughs come along. It requires an inspiration, a moment of brilliance, usually coupled with luck. And inspirations and luck together don’t come willy-nilly. These things just happen. History attests to this.

So, to conclude, AI is slowing down. There is ample evidence to back my claim. We’ve achieved a lot with what we’ve had – truly amazing things. And new uses of DL will undoubtedly appear. But DL itself is slowing reaching its top speed.

It’s hard to break this kind of news to people who think that AI will just continue growing exponentially until the end of time. It’s just not going to happen. And besides, that’s never been the case in the history of AI anyway. There have always been AI winters followed by hype cycles. ALWAYS. Perhaps we’re heading for an AI winter now? It’s definitely possible.

Update: part 2 of this post was posted recently here.

To be informed when new content like this is posted, subscribe to the mailing list: