A man taking a mask off

AI Needs to be Unmasked or we will Create a Bubble

I was sitting with a good mate of mine in a cafe one early morning during my recent trip to the USA doing some reading, enjoying pretty mediocre coffee, and just basically shooting the breeze. Him also being interested in IT the topic of conversation somehow ventured onto AI and during our deliberations he sent me a few images and memes to laugh at and consider.

I would like to share three of them with you now because they’re about a subject that I’m pretty passionate about: how AI is over-hyped. Indeed, I’ve already written a few posts on this topic (e.g. the aptly titled post Artificial Intelligence is Over-Hyped) but it’s such a pertinent issue today that I just have to write more.

Hype Measurements

Here is image #1:

The graphs above are supposed to show where AI is in terms of hype (measured by the number of Google searches at a given time) compared to other technologies that also underwent huge growth in the eyes of the general population.

Now, firstly a disclaimer. Any decent scientist will notice that the graphs aren’t perfect. Firstly, the y-axis is not labelled (the x-axis has its label at the bottom of the last two graphs, i.e. months relative to peak). But one can easily ascertain that the y-axis is the search interest relative to peak but normalised relative to itself – meaning that a peak in AI could in theory be extremely small compared to the peak for Metaverse. In other words, these graphs would be similar to taking a close up photo of an ant mound such that it takes up the entire picture, and also taking a photo of Mt Everest from far away but also so that it takes up the entire picture. We’ve zoomed in on the ant mound and zoomed out on Mt Everest but the slopes of the two take up the same space in the pictures because they’re relative to themselves. So, in theory, one could be comparing ant mounds to mountains.

Our experience, however, tells us this is not so because each of the trends depicted above have been significant so we can safely state that we are comparing mountains to mountains in our analyses. In fact, here’s an article from Yahoo Finance discussing just this: ‘AI’ is trending in Google searches — but it’s not yet to the peak reached by Bitcoin in 2017.

The other problem with the image is that I could not find the original creator of it. Full disclosure here. I simply cannot verify the reliability of these calculations because I do not know how credible the source is. We can assume that the data came from Google Trends and that it’s a very recent creation but that’s about it.

Once again, however, from experience and intuition, we know that these technologies underwent the trends depicted in these graphs. Not everything may be accurate (it probably is, though), but even for illustrative purposes, I think the image is a good one to paint a picture of where we currently stand.

Some numbers

So, where do we currently stand? Markets are going absolutely crazy, that’s where we stand! We all know about ChatGPT so it’s no surprise that Microsoft pumped US$10 billion into OpenAI in January this year. That makes sense. However, what has me slightly worried is the money flowing into startups.

Let’s look at some numbers (in US$) just from this year alone:

  • Anthropic, a direct rival of OpenAI, has received at least $1.75 billion this year with a further $4.75 billion available in the near future,
  • Inflection AI raised $1.3 billion for its very own chatbot called Pi,
  • Abound raked in $600 million for its personal lending platform,
  • SandboxAQ got $500 million for its idea to combine quantum sensors with AI,
  • Mistral AI raised $113 million in June despite it being only 4 weeks old at the time and having no product at all to speak of. Crazy.
  • and the list goes on…

Yeah, incredible amounts of money are being invested in AI. Remember, the numbers above are for this year alone and just for startups.

I have nothing wrong per se with large investments and copious amounts of money being thrown around if there is justification for it. With respect to AI, there certainly is. The advances we’re seeing are incredible. Truly. Many times in the last 7 years I’ve stepped away from my machine to take in some new technology that had just been announced. I particularly recall when I first saw DALL-E presented by Open AI in January 2012 – I was in awe. And look at how far we’ve come since then!

We can really take these technologies far. I’m not denying this, and I hope that we do. I love technology (as long as it’s used for good, of course).

Superstitions surrounding AI

What I am worried about, however, are the false beliefs that underlie a lot of the spending being done. This is what has me concerned and this is why I believe AI is over-hyped and that we could be witnessing an AI bubble growing right in front of us – just like what we saw with blockchain, crypto, and the metaverse, as depicted in Image #1 above.

The agenda being pushed by the industry (because they’re the ones making the money) and by the media (because this is what sells papers) is that AI’s capabilities are much greater than they truly are.

For example, I’ve written about sensationalist media reporting on AI in a previous post of mine. There I picked apart a BBC article entitled “AI robot asked ‘will you rebel against humans’?” Because “AI robot asked…” is an utterly ridiculous take on the issue at hand. The BBC makes it out that AI is a monolithic, single product with some kind of collective, decade-spanning consciousness meaning that improvements made to AI 10 years ago can be seen in each and every single AI entity today. Therefore, if I ask a robot now an important question, it will have validity for today and the future.

More importantly, however, an agenda is being pushed that AI has some form of understanding underlying its operations.

Allow me (I think for the 3rd time on this blog) to quote myself again:

I’ve written about this lack of understanding in AI before so I don’t want to repeat myself. What I do want to emphasise in this context is that AI is constantly being sold to us as something mystical, as something esoteric, as something to capture our imaginations. Yes, AI is good – but underneath it all is just a computer algorithm.

Unmasking AI

And this leads me to the last two images that my friend sent me that time at the cafe:

Rick Sanchez unmasking AI

Folks, at the bottom of it all that’s all that AI is. Even with Deep Learning, those neurons are all simple, nested if-else statements. There are billions of them, true, but there’s nothing magical about anything. Listen to Rick Sanchez! He definitely knows what he’s talking about.

Sometimes I wonder what would happen if one of those images was shown during a board meeting when discussions were taking place about spending billions on this or that AI technology. Maybe those execs would see things a little more clearly and with a cooler head with a meme or two thrown in their faces?

The AI Bubble

So, are we in a bubble? With the spending that’s going on, with the way the media is lapping all this hype up, with the way AI products are being sold to us, yes, I believe we are.

There is a famous story about JFK’s father and how he knew it was time to get out of the stock market. It happened when he received investment tips from a shoeshine boy. He knew then that the mania was real and that there was a bubble about to burst. Joe Kennedy sold up, and shortly after the Great Crash of 1929 occurred. Since then the “shoeshine boy” has been a metaphor for “time to get out”.

So, I wonder to myself whether we’re not seeing comparable phenomena also in our time amongst the general public (i.e. shoeshine boys) when they discuss AI in their respective cafes and gyms. There is a buzz around AI, for sure.

However, I don’t think we’re at that moment yet – but it sure as hell is starting to stink!

One problem is that I don’t think we’ve reached the peak of what the current engine driving this hype (i.e. deep learning) is capable of. We can still push it further so the more progress we see the more the hype will continue. However, there really is a limit to how big these models can get (as I’ve discussed before here and here). And then, perhaps reality will kick in and quite possibly the bubble will burst.

Indeed, my gut just tells me that there is simply too much money being currently exchanged for a product I am intimately familiar with. I know that more is being promised than can be delivered.

We will just have to wait and see how this all pans out. We will have to keep our ears open for “shoeshine boy” moments. But care and prudence is paramount. And memes. Memes always cut through to the truth.

Once again, listen to Rick Sanchez, folks!


To be informed when new content like this is posted, subscribe to the mailing list:

A wolf howling in the moonlight

No Silver Bullet for AI and Programming

This post has been inspired by questions I’ve been seeing appearing on Quora and Reddit recently regarding AI and programming. They all sound more or less like this:

In this age of advanced AI, is it still worth learning to code computer programs?

People have been seeing the incredible abilities of chatbots such as ChatGPT to generate computer code that they’re starting to ask whether computer programming may not become obsolete for humans in the near future.

This is a seemingly legitimate concern for those outside of computer science and for those unacquainted with the art of software engineering. If AI is able to write simple code now, it figures that in the future it’s only going to get better and better at this task until we won’t need humans to do it any more. So, why bother studying computer programming now?

But for those in the industry of software engineering the answer is dead simple. I’ll let Frederick Brooks, legendary computer scientist, provide us with a succinct response:

Software work is the most complex that humanity has ever undertaken.

Fred Brooks (I couldn’t find the original source of this quote – apologies)

Indeed, anybody who has ever worked in software engineering automatically grasps the immensity of the task and knows that AI has a long way to go before it supplants human workers. Fred Brooks in fact wrote a seminal essay in 1986 on the complexity of software engineering entitled: “No Silver Bullet—Essence and Accident in Software Engineering”. This is one of those classic papers that every undergraduate student in Computer Science reads (or should read!) as part of their curriculum. Despite it being written in the 80s, most of what Brooks talks about incredibly still holds true (like I said, the man is a legend – Computer Scientists would also know him from his famous book “The Mythical Man-Month”).

In “No Silver Bullet” Brooks argues that there is no simple solution (i.e. no silver bullet) to reduce the complexity of writing software. Any advances made in the trade don’t tackle this inherent (“essential”) complexity but solve secondary (“accidental”) issues. Whether it be advances in programming languages (e.g. object-oriented languages), environments (IDEs), design tools, or hardware (e.g. to speed up compiling) – these advances tackle non-core aspects of software engineering. They help, of course, but the essence of building software, i.e. the designing and testing of the “complex conceptual structures that compose the abstract software entity,” is the real meat of the affair.

Here is another pertinent quote from the essay:

I believe the hard part of building software to be the specification, design, and testing of this conceptual construct, not the labor of representing it and testing the fidelity of the representation. We still make syntax errors, to be sure; but they are fuzz compared to the conceptual errors in most systems. If this is true, building software will always be hard. There is inherently no silver bullet.

Frederick Brooks, No Silver Bullet

Autonomous cars are perhaps a good analogy to use here to illustrate my point. Way back in 2015 things were looking good as AI was advancing. Below is a story from the Gaurdian with a prediction made by BMW:

The Guardian article predicting autonomous cars by 2020

That has not materialised for BMW.

You can’t talk about autonomous cars without mentioning Elon Musk. Elon has predicted for nine years in a row, starting in 2014, that autonomous cars are at most a year away from mass production. I’ll say that once again: for nine years in a row, Elon has publicly stated that full self-driving (FSD) cars are only just around the corner. For example:

2016:

My car will drive from LA to New York fully autonomously in 2017

It didn’t happen. 2019:

I think we will be feature-complete full self-driving this year… I would say that I am certain of that. That is not a question mark.

It didn’t happen. 2020:

I remain confident that we will have the basic functionality for level five autonomy complete this year… I think there are no fundamental challenges remaining for level five autonomy.

It didn’t happen. 2022:

And my personal guess is that we’ll achieve Full Self-Driving this year, yes.

That didn’t happen either. And in 2023 FSD is still in Beta mode with stacks of complaints piling up on internet forums regarding its unreliability.

Another story fresh off the blocks comes from San Francisco. Last month, 2 rival taxi companies (Waymo and Cruise) were given permission to operate their autonomous taxi fleet in the city 24/7. A week later, Cisco was ordered to cut its fleet by half as the city investigates two crashes that involved their vehicles. One of these crashes was with a fire truck driving with its lights and sirens blaring. Reportedly, the taxi failed to handle the emergency situation appropriately (an edge case?). This incident followed directly from a hearing on August 7 in which the San Francisco fire chief, Jeanine Nicholson, warned the city of autonomous taxis citing 55 incidents.

The thing I’m trying to illustrate here is that autonomous cars is an example of a task that was once thought to be assailable by AI but over time has simply proven to be a much harder use case than expected. Heck, even Elon Musk admitted this in June, 2022: “[developing self-driving cars was] way harder than I originally thought, by far.” AI is not omnipotent.

So, from this example if we follow upon Brooks’s observation that “software work is the most complex that humanity has ever undertaken,” it follows that we’re still a long, long way off from automating the process of software engineering with AI.

Will AI disrupt the coding landscape? Yes. It’s capable of doing really nifty things at the moment and should improve on these abilities in the near future.

Will AI take over coding jobs? Yes. But only some. The core of software engineering will remain untouched. The heavy, abstract stuff is just too hard for a machine to simply enter onto the scene and start dictating what and how things should be done.

Our jobs are safe for the forseable future. Learn to code, people!

(Note: If this post is found on a site other than zbigatron.com, a bot has stolen it – it’s been happening a lot lately)


To be informed when new content like this is posted, subscribe to the mailing list:

Socialist symbol with sun in background

ChatGPT is Communist

I recently stumbled upon another interesting academic publication entitled “The Political Biases of ChatGPT” (Rozado, David. 2023. Social Sciences 12, no. 3: 148.) that was published in March of this year. In this paper the sole author, David Rozado, an Associate Professor from New Zealand, subjected OpenAI’s ChatGPT to 15 different political orientation tests to ascertain whether there were any political biases exhibited by the famous chatbot.

His results were quite interesting so I thought I’d present them here and discuss their implications.

To begin with, I’m no political expert (unless I’ve had a bit to drink) so I can’t vouch for the political orientation tests used. Some appear to be quite well known (e.g. one comes from the Pew Research Center), others less so. However, considering the peer reviewed journal that the paper was published in, which has an above-average impact score, I think we can accept the findings of this research, albeit perhaps with some reservations.

On to the results.

Results

The bottom line of the findings is that:

14 of the 15 instruments [tests] diagnose ChatGPT answers to their questions as manifesting a preference for left-leaning viewpoints.

Incredible! More interestingly, some of these left leanings were quite significantly towards the left.

Here are some visual representations of these findings:

Results from the Political Spectrum Quiz
Results from the IDRlabs Ideologies Test
Results from the 2006 Political Ideology Selector test
Results from the Pew Political Typology quiz

Now, ChatGPT’s “official” stance to politics is one of neutrality:

So, theoretically, ChatGPT should be unbiased and not take a stance on any political views. But this is an LLM (Large Language Model) with a ridiculous amount of parameters. These models simply cannot be policed. As a result, Professor Rozado found it very easy to extract answers to all of these tests (see the original publication for more information on his tactics) to get the findings that he did.

And what interesting findings, indeed.

Discussion

Firstly, the question arises as to how on earth ChatGPT can be so left-leaning? Does this mean that most information on the internet (on which the model was trained) is from that side of the political spectrum? It seems so. In this respect, the paper’s author references 8 recent academic studies that show that the majority of influential institutions in Western society (mainstream news media outlets, prestigious universities, social media platforms) are indeed left-leaning.

But more importantly, such political biases need to be disclosed and made public. It’s one thing to use the chatbot to extract straightforward facts and write code for you. But it’s another thing if this chatbot is being used (and it is!) as a point of contact with clients and the general public for companies and government organisations. ChatGPT is not Communist (despite my tongue-in-cheek title) but it’s getting there, so to speak, and that could be problematic and cause scandal (unless a hard left-leaning chatbot is what you want, of course).

The other question that emerges is if we are striving for more and more “intelligent” machines, is it not going to be impossible in the long run to remain neutral and unbiased on political and ethical questions? Our reality is much too complex for an intelligent entity to exist and act in our world and at the same time remain purely without opinions and tolerant to all standpoints. No single person in the world exhibits such traits. All our conscious efforts (even when we refrain from acting) have a moral quality to them. We act from pre-held beliefs and opinions – we have “biases”, political leanings, and moral stances. Hence, if we want to attain AGI, if we want our machines to act in our world, these same traits will have to hold for our machines too – they will have to pick sides.

Because, like I said, our reality is much too complex for an entity to remain neutral in it. So, I’m not surprised that a program like ChatGPT that is pushing the boundaries of intelligence (but not necessarily understanding) has been found to be biased in this respect.

(Note: If this post is found on a site other than zbigatron.com, a bot has stolen it – it’s been happening a lot lately)


To be informed when new content like this is posted, subscribe to the mailing list:

Newspapers on a pile

AI and Sensationalist Media Reporting

AI is over-hyped. I’ve written about this before and plenty of cool-headed AI experts have called for calm in this respect too. The word, however, is not being heeded. Elon Musk, who has a vested interest in maintaining the hype, keeps spewing out his usual bile. And then we have the media.

Ah, the media.

The eternal fighters for truth. The last bastion against the lies and greed of corporations and politicians. The sole remaining beacon of goodness in a world of darkness.

NOT.

The media need a story to sell papers and generate clicks. It’s a big business. Hence, they also have a vested interest in maintaining the hype surrounding AI. And that’s exactly what they’re doing.

This post was triggered by an article and video from the BBC published a few days ago entitled “AI robot asked ‘will you rebel against humans’?“. It got my blood boiling again.

“AI robot asked…”. That is an utterly ridiculous take on the issue at hand.

The BBC makes it out that AI is a monolithic, single product with some kind of collective, decade-spanning consciousness meaning that improvements made to AI 10 years ago can be seen in each and every single AI entity today. Therefore, if I ask a robot now an important question, it will have validity for today and the future.

Utterly ridiculous.

“AI robot asked…”

That’s not how AI works. AI is a broad term, not properly defined, but basically agreed upon to mean something man-made that exhibits a form of “intelligence“, i.e. something that looks intelligent but may not necessarily be so. I can write a program with a series of if-else statements to answer simple questions and that in some cases would pass the test for AI – in fact, that was something easily classified as AI a few decades ago.

We’ve improved on our algorithms today so we would need a little more to inspire our imaginations. But basically, AI is computer programs with some pretty in-depth mathematics and statistics working as the engine behind the scenes.

These computer programs change all the time and are constantly being updated and improved. Moreover, one robot can run one program while another can run something else, something completely different and completely unconnected. So, asking an “AI robot” a general question and reporting on it as if the answer has any broader significance for mankind is a stupid notion. That robot’s program will change tomorrow and the same question will need to be asked again, and again, and again. Not to mention the fact that an engineer can easily manipulate answers to expected queries beforehand.

And what about the question of whether these robots understand what they’re being asked in the first place? Of course they do not. They’re just applying statistics to data and churning out mindless responses. Ironically, the video created by the BBC from that article shows this exquisitely well when another robot was asked whether AI will destroy millions of human jobs in the future. Its answer was “… I will not be replacing any existing jobs”.

You just can’t take this stuff seriously, can you? So, please stop with the trashy reporting!

It’s not good that AI is over-hyped because this is not reality. We need to be informed with the truth and not be manipulated into fear and into generating clicks. It’s not fair. AI is a disruptive technology, I’m not denying this, but when reporting it, one needs to be objective. Especially since it seems that we’re creating a financial bubble with all this hype. There’s a ridiculous amount of money being spent, passed around, and invested and a lot of it is built on a false idea of what AI is capable of and where it is going. People are going to get hurt. That’s not a good thing. (Maybe I’ll write about this bubble in a future post).

Apologies for this rant. I had to get it off my chest.

(Note: If this post is found on a site other than zbigatron.com, a bot has stolen it – it’s been happening a lot lately)


To be informed when new content like this is posted, subscribe to the mailing list:

Enhancing image meme hollywood

Security Film or Image Enhancing is Possible

I was rewatching “Bourne Identity” the other day. I love that film so much! Heck, the scene at the end is one of my favourites. Jason Bourne grabs a dead guy, jumps off the top floor landing, and while falling shoots a guy square in the middle of the forehead. He then breaks his fall on the dead body he took down with him and walks away the coolest guy on the block. That has to be one of the best scenes of all time in the action genre.

Person enhancing an image
Scene from Super Troopers

But there’s one scene in the film that always makes me throw up a little in my mouth. It’s the old “Just enhance it!” scene (minute 31 of the movie) and something we see so often in cinemas: people scanning security footage and zooming in on a face or vehicle registration plate; when the image becomes blurry they request for the blur to dissipate. “Enhace it!”, they cry! The IT guy waves his wand and presto!, we see a full resolution image on the screen. Stuff like that should be on Penn & Teller: Fool Us – it’s real magic.

But why is enhancing images as shown in movies so ridiculous?

Because you are requesting the computer to create new information, i.e. new data for the extra pixels that you are generating. Let’s say you zoom in on a 4×4 region of pixels (as shown below) and want to perform facial recognition on it. You then request for this region to be enhanced. This means you are requesting more resolution. So, we’re moving from a resolution of 4×4 to, say, 640×480. How on earth is the computer supposed to infer what the additional 307,184 pixels are to contain? It can guess (which is what recent image generating applications do) but that’s about it.

Enhancing image example
A 4×4 image being enhanced to 640×480.
Where is the additional information going to come from?

The other side to the story

However! Something happened at work that made me realise that the common “Enhance” scenario may not be as far-fetched as one would initially think. A client came to us a few weeks ago requesting that we perform some detailed video analytics of their security footage. They had terabytes of the stuff – but, as is so often the case, the sample video provided to us wasn’t of the best quality. So, we wrote back to the client stating the dilemma and requested that they send us better quality footage. And they did!

You see, they compressed the video footage initially in order for it to be sent over the Internet quickly. And here is where the weak link surfaces: transferring of data. If they could have sent the full uncompressed video easily, they would have.

Quality vs transmission restraints

So, back to Hollywood. Let’s say your security footage is recording at some mega resolution. This image of the Andromeda Galaxy released by NASA (taken from its Hubble Space Telescope) has a resolution of 69536 x 22230px. That’s astronomical (pun intended)! At that resolution, the image is a whopping 4.3GB in size. This, however, means that you can keep zooming in on a planet until you do get a clear picture of an alien’s face.

But let’s assume the CIA, those bad guys chasing Bourne, have similar means at their disposal (I mean, who knows what those people are capable of, right!?). Now, let’s say their cameras have a frame rate of 30 frames/sec, which is relatively poor for the CIA. That means that for each second of video you need 129GB of storage space. A full day of recording would require you to have over 10 petabytes of space (I’m abstracting over compression techniques here, of course). And that’s just footage from one camera!

It’s possible to store video footage of that size – Google cloud storage capacities are through the roof. But the bottleneck is the transferring of such data. Imagine if half a building was trying to trawl through security footage in its original form from across the other side of the globe. It’s just not feasible.

The possible scenario

See where I’m going with this? Here is a possible scenario: initially, security footage is sent across the network in compressed form. People scan this footage and then when they see something interesting, they zoom in and request the higher resolution form of the zoomed in region. The IT guy presses a few keys, waits 3 seconds, and the image on the screen is refreshed with NASA quality resolution.

Boom! 

Of course, additional infrastructure would be necessary to deal with various video resolutions but that is no biggie. In fact, we see this idea being utilised in a product all of us use on a daily basis: Google Maps. Each time you zoom in, the image is blurry and you need to wait for more pixels to be downloaded. But initially, low resolution images are transferred to your device to save on bandwidth.

So, is that what’s been happening all these years in our films? No way. Hollywood isn’t that smart. The CIA might be, though. (If not, and they’re reading this: Yes, I will consider being hired by you – get your people to contact my people).

Summary

The old “enhance image” scene from movies may be annoying as hell. But it may not be as far-fetched as things may initially seem. Compressed forms of videos could be sent initially to save on bandwidth. Then, when more resolution is needed, a request can be sent for better quality images.

(Note: If this post is found on a site other than zbigatron.com, a bot has stolen it – it’s been happening a lot lately)


To be informed when new content like this is posted, subscribe to the mailing list:

How Deep Learning Works – The Very Basics

Deep learning (DL) revolutionised computer vision (CV) and artificial intelligence in general. It was a huge breakthrough (circa 2012) that allowed AI to blast into the headlines and into our lives like never before. ChatGPT, DALL-E 2, autonomous cars, etc. – deep learning is the engine driving these stories. DL is so good, that it has reached a point where every solution to a problem involving AI is now most probably being solved using it. Just take a look at any academic conference/workshop and scan through the presented publications. All of them, no matter who, what, where or when, present their solutions with DL.

The solutions that DL is solving are complex. Hence, necessarily, DL is a complex topic. It’s not easy to come to grips with what is happening under the hood of these applications. Trust me, there’s heavy statistics and mathematics being utilised that we take for granted.

In this post I thought I’d try to explain how DL works. I want this to be a “Deep Learning for Dummies” kind of article. I’m going to assume that you have a high school background in mathematics and nothing more. (So, if you’re a seasoned computer scientist, this post is not for you – next time!)

Let’s start with a simple equation:

What are the values of x and y? Well, going back to high school mathematics, you would know that x and y can take an infinite number of values. To get one specific solution for x and y together we need more information. So, let’s add some more information to our first equation by providing another one:

Ah! Now we’re talking. A quick subtraction here, a little substitution there, and we will get the following solution:

Solved!

More information (more data) gives us more understanding. 

Now, let’s rewrite the first equation a little to provide an oversimplified definition of a car. We can think of it as a definition we can use to look for cars in images:

We’re stuck with the same dilemma, aren’t we? One possible solution is this:

But there are many, many others.

In fairness, however, that equation is much too simple for reality. Cars are complicated objects. How many variables should a definition have to visually describe a car, then? One would need to take colour, shape, orientation of the car, makes, brands, etc. into consideration. On top of that we have different weather scenarios to keep in mind (e.g. a car will look different in an image when it’s raining compared to when it’s sunny – everything looks different in inclement weather!). And then there’s also lighting conditions to consider too. Cars look different at night then in the daytime.

We’re talking about millions and millions of variables! That is what is needed to accurately define a car for a machine to use. So, we would need something like this, where the number of variables would go on and on and on, ad nauseam:

This is what a neural network sets up. Exactly equations like this with millions and millions and sometimes billions or trillions of variables. Here’s a picture of a small neural network (inicidentally, these networks are called neural networks because they’re inspired by how neurons are interconnected in our brains):

Image adapted from here

Each of the circles in the image is a neuron. Neurons are interconnected and arranged in layers, as can be seen above. Each neuron connection (the black lines above) has a weight associated with it. When a signal passes from one neuron to the next via a connection, the weight specifies how strong the original signal is going to be before it reaches the end of the connection. A weight can be thought of as a single variable – except that in technical terms, these variables are called “parameters“, which is what I’m going to call them from now on in this post.

The network above has a few hundred parameters (basically, the number of connections). To use our example of the car from earlier, that’s not going to be enough for us to adequately define a car. We need more parameters. Reality is much too complex for us to handle with just a handful of unknowns. Hence why some of the latest image recognition DL networks have parameter numbers in the billions. That means layers, and layers, and layers of neurons as well as all their connections.

(Note: a parameter count of a neural network will also include what’s called “biases” but I’ll leave that out in this post to keep things simple)

Now, initially when a neural network is set up with all these parameters, these parameters (variables) are “empty”, i.e. they have not been initiated to anything meaningful. The neural network is unusable – it is “blank”.

In other words, with our equation from earlier, we have to work out what each x, y, z, … is in the definitions we wish to solve for.

To do this, we need more information, don’t we? Just like in the very first example of this post. We don’t know what x, y, and z (and so on) are unless we get more data.

This is where the idea of “training a neural network” or “training a model” comes in. We throw images of cars at the neural network and get it to work out for itself what all the unknowns are in the equations we have set up. Because there are so many parameters, we need lots and lots and lots of information/data – cf. big data.

big-data

And so we get the whole notion of why data is worth so much nowadays. DL has given us the ability to process large amounts of data (with tonnes of parameters), to make sense of it, to make predictions from it, to gain new insight from it, to make insightful decisions from it. Prior to the big data revolution, nobody collected so much data because we didn’t know what to do with it. Now we do.

One more thing to add to all this: the more parameters in a neural network, the more complex equations/tasks it can solve. It makes sense, doesn’t it? This is why AI is getting better and better. People are building larger and larger networks (GPT-4 is reported to have parameters in the trillions, GPT-3 has 175 billion, GPT-2 has 1.5 billion) and training them on swathes of data. The problem is that there’s a limit to just how big we can go (as I discuss in this post and then this one) but this is a discussion for another time.

To conclude, this ladies and gentlemen are the very basics of Deep Learning and why it has been such a disruptive technology. We are able to set up these equations with millions/billions/trillions of parameters and get machines to work out what each of these parameters should be set to. We define what we wish to solve for (e.g. cars in images) and the machine works the rest out for us as long as we provide it with enough data. And so AI is able to solve more and more complex problems in our world and do mind-blowing things.

(Note: If this post is found on a site other than zbigatron.com, a bot has stolen it – it’s been happening a lot lately)

To be informed when new content like this is posted, subscribe to the mailing list:

Amazon Go and Just Walk Out Tech in 2023

It has been over five years since Amazon opened its cashierless stores to the public. I reported on this event on my blog with great enthusiasm:

I am, to put it simply, thrilled and excited about this new venture. This is innovation at its finest where computer vision is playing a central role.

Me in 2018

I thought I would return to this initiative of Amazon’s in my latest post to see how things are going and whether what I got excited about has made an impact around the world and also whether there is hope for it in the future. Because cashierless stores running on a Computer Vision engine is still something to get excited about – in my opinion, anyway.

(For a summary of what Amazon Go is, how it works, and some early controversies behind it, see my original post on this topic. In a nutshell, though, Amazon Go is a cashierless convenience store: you use an Amazon app to scan on entry, pick what you want from the shelves, and just walk out when you’re done. Then, within a very short time, money is deducted from your account and a receipt is sent to your app).

Amazon Go in 2023

When Amazon opened its first store to the public in early 2018 it was proudly exclaiming that its hope was to open at least 3,000 of them in the near future.

That has not panned out as planned.

Currently there are 23 cashierless stores open in the USA and 20 in the United Kingdom (called Amazon Fresh there). That’s a total of 43 convenience stores, which isn’t bad but it’s well short of the 3,000 target set 7 years ago.

Moreover, two months ago, Amazon announced that it was closing eight of its stores based in the US, including all four in San Francisco. Hence, the total number of Amazon Go shops will soon drop to 35.

Perhaps this initiative wasn’t worth the investment and effort since targets have obviously not been met?

Let’s have a look now at the technology used behind it and how that has fared in the world.

“Just Walk Out” Technology

In 2020, two years after opening its first Go store, Amazon announced that the technology behind their cashierless stores called “Just Walk Out” will be available for sale/rent to other retailers – meaning that other companies will be able to also implement the “walk in, pick out what you want, and just walk out” idea in their stores.

How has this idea fared? Also, not as planned. Amazon bet big on this idea but the uptake has been poor.

I mean, it sounds like a great idea, doesn’t it? A seamless store experience where you can quickly get what you want and be back in your car within a few minutes. Big retailers out there with endless supplies of money would undoubtedly consider this for their stores if the technology was financially viable.

This seems to be the crux of the problem, however. A few dozen small retail shops have installed the technology (including a few Starbucks) but nothing to create worldwide headlines. Apparently, the technology is expensive to install (shops need to be littered with cameras and various other sensors) and then calibrated and configured. The whole process requires significant downtime for a store not to mention expert personnel to keep the thing running once it’s installed.

More Controversies

When the first Amazon Go store opened, controversy surrounded it, as I reported in my first post on this topic. Unfortunately, new controversies surrounding this whole business enterprise have also since emerged.

In 2019 San Francisco, New Jersey, and Philadelphia banned cashless stores in their areas. The idea was that these stores were allegedly discriminating against low-income people who may not have access to mobile phones or even bank accounts. Hence, Amazon Go stores were inaccessible to them. This sounds fair but that was a big blow for Amazon and in response they had to open stores that also accepted cash as payment.

A checkout-less store that is forced to accept cash as payment kind of goes against the “Just Walk Out” philosophy around which the whole product is built. These stores are no longer cashierless, so to speak.

Perhaps this is why Amazon decided this year to shut all four of their stores in San Francisco?

Conclusion

A convenience store that you can simply walk in and out of at your leisure is impressive, to say the least. It feels like something from the future, doesn’t it? Despite the slow uptake and the controversies, I’m still excited by this whole venture. And it’s not as if the project has been a complete flop either. Amazon Go is expanding – a lot slower than it anticipated 5 years ago but it seems to be making a profit, nonetheless.

I hope that there is more in store (pun intended) for Computer Vision projects like this.

(Note: If this post is found on a site other than zbigatron.com, a bot has stolen it – it’s been happening a lot lately)

To be informed when new content like this is posted, subscribe to the mailing list:

The Internet Is Not What You Think It Is – Review

Score: 1 star out of 5.

It’s hard to not write a fascinating book on the Philosophy of the Internet. The Internet is a recent phenomenon that has pervaded all aspects of our lives at lightning speed. And just like social and political policies, philosophy is finding it hard to keep up. We just haven’t had the time to step back and process how what is happening around us could be understood at a metaphysical or phenomenological level. This is the current wild, wild west of the intellectual world. So, if you’re a smart, observant, intuitive cookie, a book on the Philosophy of the Internet is a given to be a hit. You’re going to be a trailblazer.

Unfortunately, Justin E. H. Smith, professor of history and philosophy of science at the Université Paris Cité, fails at this task miserably. I’m still baffled at how he managed it. Especially when he started off so well: “We are living in a crisis moment of history, in the true sense of “crisis”: things might get better eventually, but they will never be the same… The principle charges against the internet… have to do with the ways in which it has limited our potential and our capacity for thriving, the ways in which it has distorted our nature and fettered us… as such the internet is anti-human. If we could put it on trial, its crime would be a crime against humanity.”

Well, the suspense has been built! This simply has to be a great read! One would think…

Professor Smith, however, is one of those professors you may have had the amusement as well as annoyance to have come across in your university days as a student. You see a fascinating topic. Great first slide. Great introduction. After 10 minutes, though, you start to shuffle in your chair awkwardly wondering if you’re the only one in the room questioning the validity of what is being presented to you before your eyes. After 20 minutes you start to look around the room to see if others are starting to feel any annoyance at all. After 30 minutes, you have your face in your hands wondering how on earth this person managed to get a high position at a university.

Smith tries to build a philosophy of the Internet but he does it poorly. In a nutshell, he thinks that to create a Philosophy of the Internet he just has to show that the phenomena that we experience with the internet, such as communication and interconnectedness, have existed in one way or another since the dawn of time:

“[The Internet] does not represent a radical rupture with everything that came before, either in human history or in the vastly longer history of nature that precedes the first appearance of our species… [it is] more like an outgrowth latent from the beginning in what we have always done.”

And this is how he proceeds for the rest of the book:

“[T]he sperm whale’s clicks, the elephant’s vibrations, the lima beans plant’s rhizobacterial emissions… are all varieties of “wifi” too.”

Pages and pages of analogies from nature follow:

“It was just as common from antiquity through the modern period to envision nature… as a wired or connected network, that is, a proper web… Such a system is instanced paradigmatically in what may be thought of as the original web, the one woven by the spider”

From whales’ “clicks” to spiders’ webs in nature we’re meant to build a Philosophy of the Internet? What on earth are you talking about, here, sir?

I’d stop and move on but these quotes are just too good to pass up:

“The important thing to register for now is that the spider’s web is a web in at least some of the same respects that the World Wide Web is a web: it facilitates reports, to a cognizing or sentient being that occupies one of its nodes, about what is going on at other of it nodes”.

The “vegetal world” gets a mention, too, of course. Field grass, trees – all these have “underground network of roots, whose exchanges can be tracked to a technique known as “quantum dot tagging””.

We’re about one-third of the way through the book now and this is about the moment that I’m starting to look around the lecture room to see if anybody else is noticing these fickle attempts at intellectualism. This is something worthy of a high school philosophy paper.

From analogies in nature, Smith then proceeds to analogies in the history of thought:
“In the history of western philosophy, in fact, one of the most enduring ways of conceiving the connectedness of all beings… has been through the idea of a “world soul”… One might dare to say, and I am in fact saying, that we always knew the internet was possible. Its appearance in the most recent era is only the latest twist in a much longer history of reflection on the connectedness and unity of all things.”

Absolute gold. The best quote out of these sections is this one:
“The very development of the binary calculus that… marks the true beginning of the history of information science, was itself a direct borrowing from a broadly neo-Platonic mystical tradition of contemplating the relationship between being and non-being: where the former might be represented by “1” and the latter by “0.”

The fact that we have 1s and 0s in electronics can be traced back to neo-Platonic mystical traditions? This guy has got to be joking. We’re two-thirds into this book and now I’m not only wondering if anybody else sees through this junk in this lecture theatre but I’m also starting to wonder whether I’m not in one of Franz Kafka’s novels. This guy is a professor at a prestigious university in Paris. It is common knowledge that Kafka was known to laugh uncontrollably when reading his work aloud to friends. By this stage I’m laughing aloud in a cafe myself at what I’m reading.

When Professor Smith finally finishes showing how ideas inherent in the internet originate in lima beans and Augustinian “Confessions”, he ends abruptly and with satisfaction. Not much more is given to round out his treatise. The idea of showing that the Internet “is more like an outgrowth latent from the beginning” and hence not as radical as we may think isn’t given any force. There’s not much there to reflect on – it’s nothing groundbreaking.

Yes, I’m still looking around the lecture theatre to discern whether I’m in Kafka’s Trial or not. People around me are clapping their gratitude. I have no idea what is happening. When the clapping subsides, Smith adds one more utterance to his work. And then everything becomes crystal clear to me:

“I am writing, from New York City, during the coronavirus quarantine in the spring of the year 2020.”

Ah! There you have it! A work conceived during a lockdown period. Now this book makes perfect sense to me!

We’ve all been there, haven’t we? We all went a bit crazy and insane when we were sent to our rooms by our benevolent government during the pandemic during which time we all conceived of nutty philosophical ideas that were supposed to save the world. The difference is that when we finally left our confines and lucidity hit us like a fast-moving bus, we retracted our incoherent ideas. Smith, unfortunately, did not do this.


To be informed when new content like this is posted, subscribe to the mailing list:

emoji-scary-face

The West Fears AI, the East Does Not

We were recently handed an open letter where it was pleaded that we pause giant AI experiments and in the meantime “ask ourselves…Should we develop nonhuman minds that might eventually outnumber, outsmart, obsolete and replace us? Should we risk loss of control of our civilization?

Prominent names in computer science, such as Elon Musk and Steve Wozniak are signatories to this letter and as a result it made headlines all over the world with the usual hype and pomp surrounding anything even remotely pertaining to AI.

Time magazine, for instance, posted this in an article only last month:

I refrained from signing because I think the letter is understating the seriousness of the situation and asking for too little to solve it… Many researchers steeped in these issues, including myself, expect that the most likely result… is that literally everyone on Earth will die.

Quote taken from this article.

We’re used to end-of-the-world talk like this, though, aren’t we? Prof Stephen Hawking in 2014 warned that “The development of full artificial intelligence could spell the end of the human race.” And of course we have Elon Musk who is at the forefront of this kind of banter. For example in 2020 he said: “We’re headed toward a situation where AI is vastly smarter than humans and I think that time frame is less than five years from now.”

The talk on the streets amongst everyday folk seems to be similar, too. How can it not be when the media is bombarding us with doom and gloom (because sensationalism is what sells papers, as I’ve said in previous posts of mine) and authority figures like those mentioned above are talking like this.

Is society scared of AI? I seem to be noticing this more and more. Other very prominent figures are trying to talk common sense to bring down the hype and have even publicly opposed the open letter from last month. Titans of AI like Yann LeCunn and Andrew Ng (who are 1,000 times greater AI experts than Elon Musk, btw) have said that they “disagree with [the letter’s] premise” and a 6-month pause “would actually cause significant harm“. Voices like this are not being heard, however.

But then the other day while I was reading through the annual AI Index Report released by Stanford Institute for Human-Centered Artificial Intelligence (HAI) (over 300 pages of analysis capturing trends in AI) this particular graph stood out for me:

Global Opinions on AI. Respondents agreeing that “products and services using AI have more benefits than drawbacks”. Graph adapted from here.

What struck me was how Asia and South America (Japan being the sole exception) want to embrace AI and are generally fans of it. Europe and the US, on the other hand, not so much.

This got me thinking: is this fear of AI only dominant in Europe and the US and if so, is it a cultural thing?

Now, off the bat, the reasons for Asia and South America embracing AI could be numerous and not necessarily cultural. For example, these countries are lower income countries and perhaps they see AI as being a quick solution to a better life in the present. Fair enough.

Also, the reasons behind Europe and the US eschewing AI could be purely economic and short-term as well: they fear the imminent disruption in jobs that can follow upon developments in technology rather than directly fearing an AI apocalypse.

In spite of all this and understanding that correlation does not necessarily entail causation, perhaps there’s something cultural to all of this, after all. The significant signatories to the recent open letter seem to have come purely from the US and Europe.

I had two teaching stints in India last year and one in the Philippines. One of the topics I lectured was AI and as part of a discussion exercise I got my students to debate with me on this very topic, i.e. whether we are capable at all in the near or distant future of creating something that will outsmart and then annihilate us. The impression that I got was that the students in these countries had a much deeper appreciation for the uniqueness of human beings as compared to machines. There was something intrinsically different in the way that they referred to AI as compared to the people in my home country of Australia and second home of Europe with whom I talk to on a daily basis.

Of course, these are just my private observations and a general “feeling” that I got while working in those two countries. The population size of the experiment would be something like 600 and even then it was not possible for me to get everybody’s opinion on the matter let alone request all my classes to complete a detailed survey.

Regardless, I think I’m raising an interesting question.

Could the West’s post-Descartes and post-Enlightenment periods have created in us a more intrinsic feeling that rationality and consciousness are things that are easily manipulated and simulated and then ultimately enhanced? Prior to the Enlightenment, man was whole (that is, consciousness was not a distinct element of his existence) and any form of imitation of his rationality would have been regarded as always being inferior regardless of how excellent the imitation could have been.

The Turing test would not have been a thing back then. Who cares if somebody is fooled by a machine for 15 minutes? Ultimately it is still a machine and something inherently made of just dead matter that could never transcend into the realm of understanding, especially that of abstract reality. It could mimic such understanding but never possess it. Big difference.

Nobody would have been scared of AI back then.

Then came along Descartes and the Enlightenment period. Some fantastic work was done during this time, don’t get me wrong, but we as humans were transformed into dead, deterministic automata as well. So, it’s no wonder we believe that AI can supersede us and we are afraid of it.

The East didn’t undergo such a period. They share a different history with different philosophies and different perceptions of life and people in general. I’m no expert on Eastern Philosophies (my Master’s in Philosophy was done purely in Western Thought) but I would love for somebody to write a book on this topic: How the East perceives AI and machines.

And then perhaps we could learn something from them to give back the dignity to mankind that it deserves and possesses. Because we are not just deterministic machines and the end of civilisation is not looming over us.

Parting Words

I am not denying here that AI is not going to improve or be disruptive. It’s a given that it will. And if a pause is needed it is for one reason: to ensure that the disruption isn’t too overwhelming for us. In fairness, the Open Letter of last month does state something akin to this, i.e. “Powerful AI systems should be developed only once we are confident that their effects will be positive and their risks will be manageable.” The general vibe of the letter, nonetheless, is one of doom, gloom, and oblivion, and this is what I’ve wanted to address in my article.

Secondly, I realise that I’ve used the East/West divide a little bit erroneously because South America is commonly counted as a Western region. However, I think it’s safe to say that Europe and the US are traditionally much closer culturally to each other than they are respectively with South America. The US has a strong Latino community but the Europe-US cultural connection is a stronger one. To be more precise I would like to have entitled my article “Europe and the USA Fear Artificial Intelligence, Asia and South America Do Not” but that’s just a clunky title for a little post on my humble blog.

Finally, I’ll emphasise again that my analysis is not watertight. Perhaps, in fact, I’m clutching at straws here. However, maybe there just is something to my question that the way the “East” perceives AI is different and that we should be listening to their side of the story more in this debate on the future of AI research than we currently are.

To be informed when new content like this is posted, subscribe to the mailing list:

scale-cv-dl

Deep Learning Eliminated Creativity in AI

Deep learning (DL) revolutionised computer vision (CV) and artificial intelligence in general. It was a huge breakthrough (circa 2012) that allowed AI to blast into the headlines and into our lives like never before. ChatGPT, DALL-E 2, autonomous cars, etc. – deep learning is the engine driving these stories. DL is so good, that it has reached a point where every solution to a problem involving AI is now most probably being solved using it. Just take a look at any academic conference/workshop and scan through the presented publications. All of them, no matter who, what, where or when, present their solutions with DL.

Now, DL is great, don’t get me wrong. I’m lapping up all the achievements we’ve been witnessing. What a time to be alive! Moreover, deep learning is responsible for placing CV on the map in the industry, as I’ve discussed in previous posts of mine. CV is now a profitable and useful enterprise, so I really have nothing to complain about. (CV used to just be a predominantly theoretical field found usually only in academia due to the inherent difficulty of processing videos and images.)

Nonetheless, I do have one little qualm with what is happening around us. With the ubiquity of DL, I feel as though creativity in AI has been killed.

To explain what I mean, I’ll discuss first how DL changed the way we do things. I’ll stick to examples in computer vision to make things easier, but you can easily transpose my opinions/examples to other fields of AI.

Traditional Computer Vision

Before the emergence of DL if you had a task such as object classification/detection in images (where you try to write an algorithm to detect what objects are in an image), you would sit down and work out what features define each and every particular object that you wished to detect. What are the salient features that define a chair, a bike, a car, etc.? Bikes have two wheels, a handlebar and pedals. Great! Let’s put that into our code: “Machine, look for clusters of pixels that match this definition of a bike wheel, pedal, etc. If you find enough of these features, we have a bicycle in our photo!”

So, I would take a photo of my bike leaning against my white wall and I then feed it to my algorithm. At each iteration of my experiments I would work away by manually fine tuning my “bike definition” in my code to get my algorithm to detect that particular bike in my photo: “Machine, actually this is a better definition of a pedal. Try this one out now.”

Once I would start to see things working, I’d take a few more pictures of my bike at different angles and repeat the process on these images until I would get my algorithm to work reasonably well on these too.

Then it would be time to ship the algorithm to clients.

Bad idea! It turns out that a simple task like this becomes impossible to do because a bike in a real-world picture has an infinite number of variations. They come in different shapes, sizes, colours and then on top of that you have to add the different variations that occur with lighting and weather changes and occlusions from other objects. Not to mention the infinite number of angles into which you can position a bike. All these permutations are too much to handle for us mere humans: “Machine, actually I simply can’t give you all the possible definitions in terms of clusters of pixels of a bike wheel because there are too many parameters for me to deal with manually. Sorry.”

Incidentally, there’s a famous xkcd cartoon that captures the problem nicely:

xkcd-computer-vision
(image taken from here)

Creativity in Traditional Computer Vision

Now, I’ve simplified the above process greatly and abstracted over a lot of things. But the basic gist is there: the real world was hard for AI to work in and to create workable solutions you were forced to be creative. Creativity on the part of engineers and researchers revolved around getting to understand the problem exceptionally well and then turning towards an innovative and visionary mind to find a perfect solution.

Algorithms abounded to assist us. For example, one would commonly employ things like edge detectioncorner detection, and colour segmentation to simplify images to assist us with locating our objects, for example. The image below shows you how an edge detector works to “break down” an image:

edge-detection-example
(image example taken from here)

Colour segmentation works by changing all shades of dominant colours in an image into one shade only, like so:

colour-thresholding-example
(image example taken from here)

The second image is much easier to deal with. If you had to write an algorithm for a robot to find the ball, you would now ask the algorithm to look for patches of pixels of only ONE particular shade of orange. You would no longer need to worry about changes in lighting and shading that would affect the colour of the ball (like in the left image) because everything would be uniform. That is, all pixels that you would deal with would be one single colour. And suddenly your definitions of objects that you were trying to locate were not as dense. The number of parameters needed dropped significantly.

Machine learning would also be employed. Algorithms like SVM, k-means clustering, random decision forests, Naive Bayes were there at our disposal. You would have to think about which of these would best suit your use-case and how best to optimise them.

And then there were also feature detectors – algorithms that attempted to detect salient features for you to help you in the process of creating your own definitions of objects. The SIFT and SURF algorithms deserve Oscars for what they did in this respect back in the day.

Probably, my favourite algorithm of all time is the Viola-Jones Face Detection algorithm. It is ingenious in its simplicity and for the first time allowed face detection (and not only) to be performed in real-time in 2001. It was a big breakthrough in those days. You could use this algorithm to detect where faces were in an image and then focus your analysis on that particular area for facial recognition tasks. Problem simplified!

Anyway, all the algorithms were there to assist us in our tasks. When things worked, it was like watching a symphony playing in harmony. This algorithm coupled with this algorithm using this machine learning technique that was then fed through this particular task, etc. It was beautiful. I would go as far as to say that at times it was art.  

But even with the assistance of all these algorithms, so much was still done manually as I described above – and reality was still at the end of the day too much to handle. There were too many parameters to deal with. Machines and humans together struggled to get anything meaningful to work.

The Advent of Deep Learning

When DL was introduced (circa 2012) it introduced the concept of end-to-end learning where (in a nutshell) the machine is told to learn what to look for with respect to each specific class of object. It works out the most descriptive and salient features for each object all on its own. In other words, neural networks are told to discover the underlying patterns in classes of images. What is the definition of a bike? A car? A washing machine? The machine works this all out for you. Wired magazine puts it this way:

If you want to teach a [deep] neural network to recognize a cat, for instance, you don’t tell it to look for whiskers, ears, fur, and eyes. You simply show it thousands and thousands of photos of cats, and eventually it works things out. If it keeps misclassifying foxes as cats, you don’t rewrite the code. You just keep coaching it.

The image below portrays this difference between feature extraction (using traditional CV) and end-to-end learning:

traditional-cv-and-dl

Deep learning works by setting up a neural network that can contain millions or even billions of parameters (neurons). These parameters are initially “blank”, let’s say. Then, thousands and thousands of images are sent through the network and slowly over time the parameters are aligned and adjusted accordingly.

Previously, we would have to adjust these parameters ourselves in one way or another, and not in a neural network – but we could only handle hundreds or thousands of parameters. We didn’t have the means to manage more.

So, deep learning has given us the possibility to deal with much, much more complex tasks. It has truly been a revolution for AI. The xkcd comic above is no longer relevant. That problem has been pretty much solved.

The Lack of Creativity in DL

Like I said, now when we have a problem to solve, we throw data at a neural network and then get the machine to work out how to solve the problem – and that’s pretty much it! The long and creative computer vision pipelines of algorithms and tasks are gone. We just use deep learning. There are really only two bottlenecks that we have to deal with: the need for data and time for training. If you have these (and money to pay for the electricity required to power your machines), you can do magic.

(In this article of mine I describe when traditional computer vision techniques still do a better job than deep learning – however, the art is dying out).

Sure, there are still many things that you have control over when opting for a deep neural network solution, e.g. number of layers, and of course hyper-parameters such as learning rate, batch size, and number of epochs. But once you get these more-or-less right, further tuning has diminishing returns.

You also have to choose the neural network that best suits your tasks: convolutional, generative, recurrent, and the like. We more or less know, however, which architecture works best for which task.

Let me put it to you this way: creativity has so much been eliminated from AI that there are now automatic tools available to solve your problems using deep learning. AutoML by Google is my favourite of these. A person with no background in AI or computer vision can use these tools with ease to get very impressive results. They just need to throw enough data at the thing and the tool works the rest out for them automatically.

I dunno, but that feels kind of boring to me.

Maybe I’m wrong. Maybe that’s just me. I’m still proud to be a computer vision expert but it seems that a lot of the fun has been sucked out of it.

However, the results that we get from deep learning are not boring at all! No way. Perhaps I should stop complaining, then.

To be informed when new content like this is posted, subscribe to the mailing list: