Let’s take the task of driving, automation evolution levels range from the human driver all the way to a complete robotic driver as seen below:
While robotaxis are a complex yet exciting AI use case, we are going to follow a much simpler task- cat analytics. The challenges, methodologies, and workflows we’ll see in our cat analytics apply to all cognitive applications, robotaxis included.
Like every good AI project planning, it begins with defining our product business values:
- Reduce the work required by the city’s health inspectors
- Improve public health by more frequent monitoring
- Implementing faster and smarter resolution of public health issues
After talking to many of our potential customers we are convinced there is a real opportunity here in saving many human working hours and starting the development of the cat analytics application. An application that will eventually replace the city’s human health inspectors.
Before diving into artificial intelligence, it is worth going over basic biological intelligence principles, after all, AI is all about mimicking this thinking process.
So how does the thinking process work?
Science isn’t really clear on the full details, but we tend to use the following, probably over-simplified, model of thinking (intelligence) flow:
We have senses that capture signals from our environment. Most humans have 5 senses: touch, sight, hearing, smell, and taste. Some animals have more exotic senses. Bumblebees, for example, can sense negative electric charges which are found in flowers.
The senses (sensors) can capture noisy signals (data) from the environment around them and send them as a form of a message to the brain over our nervous system for further processing.
The first thing the brain does across all signals is perception analysis. In other words, answering questions in a very basic manner: what do I see, hear, touch, taste, and smell? The results of the perception analysis are a symbolic representation of the things around us at present.
As the current environment is being analyzed, more advanced cognitive parts of our brain start working. Our decision system knows our experience, the present perceived environment, and the desired future outcome. Our decision system will predict the best action to take to reach the future outcomes we want.
Once the best action is identified, we will take it and manipulate the environment that will be changed by our actions. Our senses will measure the new environment again, and another iteration will start again. This is our primary thinking loop.
AI Evolution Stages
Our basic thinking loop is the core of what we call intelligence and the goal of AI, yet it is beyond our science and technology to artificially create machines that can “think” like humans. It is common to divide the development of intelligence into 3 phases on the road for true AI.
An application that can help a specific human task, completely (replacing the human) or partially (assisting the human). This is where the entire AI industry is today. During the development of our AI, using narrow AI technologies, we will further divide our application into many narrower (even smaller) sub-tasks.
General AI (AGI)
An application that can fully replace a human is often referred to as Artificial General Intelligence (AGI). This field is mainly researched in the academy and is expected to stay an academic topic for the coming decades. Today, AGI technologies cannot help us with our cat’s analytics development.
An application that is by far smarter than humans. Today, it exists only in sci-fi books or movies and is referred to many times as a singular moment or intelligent explosion. In this phase the AI bot is so smart that it creates smarter versions of itself. We don’t really know what Super AI is, or if such a thing will ever exist. If AGI is a great material for sci-fi books, Super AI is more for philosophical books.
Are we going to have AGI and when?
As we are going to see, solving AI problems using the “Narrow AI” technologies available for us today is a slow, complex, and expensive process. Do we have magical A(G)I coming soon that will make developing our cat analytics a matter of a few clicks? You should first ask yourself “is AGI possible using our current technologies?”.
The 4th AI Era – The Data Era
With so much buzz around AI, one might think there was a technological breakthrough, a new type of hardware or algorithm that started the current wave of AI. The reality is that all the technologies and principles that ignited the current AI wave existed for decades. There is a fundamental technological change, but there is an inflection point that “suddenly” makes things, once considered impossible – possible (cat recognition included).
The inflection point, spiked the current wave of AI started with 3 changes taking place at the same time:
- Data volumes availability – Huge amounts of data, unavailable before, allowed new types of experimentation.
- Data processing speed – Graphic processors, very powerful processing common in gaming, started being used to process data for AI.
- GPU Data modeling – Neural networks- a tool used for decades before, that has been discovered as very efficient in modeling the available data using available hardware.
In 2006, a scientist called Fei-Fei Li decided that it was a good idea to work on improving data for computer vision rather than the common approach of working on algorithms. Li, inspired by the textual WordNet database (created in 1985), wanted the same data to be available to computer vision researchers.
Together with other members in Princeton, the team has created ImageNet and used Amazon Mechanical Turk to label the 14 million images into 20,000 categories. ImageNet was presented after 3 years, during 2009.
In 2010, The ImageNet project declared a contest based on its dataset, the largest in the world: ImageNet Large Scale Visual Recognition Challenge (ILSVRC).
Winning this content has become the de-facto standard for being the best in machine visual recognition.
Data Processing Speed
The processing speed of data, often mentioned along with Moore’s law, is coming to a halt. It’s a slow process that started in 2003 and will be completed in 2025.
Around 2003, the ability for general processors (like the ones in your PC) to run faster reached its limits. Around 2013, power efficiency (allowing you to have a powerful computer in your smartphone) was reached.
We’re at the final diminishing phase of Moore’s law, with the expectancy of transistor density increase (what makes the prices of technology go down every year) to halt around 2025.
With the above processing speed halt trend, the need for parallel processing (process simultaneously data using many cores) has increased, graphical processors (GPUs) were very good at such tasks and allow more “room” for data processing speed.
GPU Data Modeling
The earliest sign for the GPU processing era trend started in 2006, a bit after CPU speeds halted. GPU-based algorithms have won image recognition contests, for being 4 times faster than CPU (K. Chellapilla et al.).
By 2011, a deep neural network was already showing to be 60 times faster (Cireșan et al.), and then came AlexNet.
AlextNet, introduced in 2012 was inspired by Cireșan, and both have been designed as variants of LeNet (Yann LeCun et al., 1989).
LeNet was the first neural network to be trained automatically, it was an evolution of Neocognitron (Kunihiko Fukushima,1979), the first neural network.
AlextNet has won the ImageNet contest by a large margin and changed the world. It was the first-time neural network was used in the contest and no other algorithm types have won since then. In the year following, neural networks took over the academy and won every “AI” benchmark or contest.
It is fair to say AlexNet is the moment neural networks have become the standard for AI tasks, AlexNet moment was a worldwide acknowledgment moment for neural networks, rather than a breakthrough moment. Winning ImageNet in such a big gap has ignited the neural network, large datasets, and GPU era.
AI is an Evolution, Not Revolution
As seen, the 4th wave of AI was not ignited by a breakthrough moment, it is a gradual evolution of data collection, labeling, and processing entering the public consensus around 2013 in the research community and to the rest of the market years later. AI is part of the final stage of silicon-based processors evolution we’ve had in the last 60 years, allowing the next stage of growing data processing volumes, speed, better algorithms, and hardware to support that growth.
You can expect this trend to continue and during the development of our cat analytics we will see the meaning of this new data
is and how we can expect it to evolve.
With the above in mind, why does everybody talk about AGI?
The term AGI gained popularity in recent years and for many, the difference between AI and AGI is not clear. The AGI term has gained popularity to distinguish it from the “AI” we have today to a “real AI” we might have someday, The AI we have today is all about data crunching and the machines we have today do not contain more intelligence compared to machines 30 years ago,89 people who use the term AGI hint that machines will be able to think someday, just like us rather than being dummy (Yet very useful) data processors.
Are we going to have AGI and when?
There is no definitive answer and there is a fundamental question we need to answer first: “Is more of the same is enough for AGI”? Is it just a matter of faster processing of more data to get there?
If we believe that it’s just a matter of more data processing, then we can do some quick math when we will have AGI:
The time we will have AGI is probably about the time our processors and the human brain have the same amount of computing power or brainpower (in such scenarios they are considered the same).
The common way to make such a comparison is to count neurons, the basic computation units that exist both in AI and our brain, think of them as small calculators.
So how much time will it take for our technology to supply the same number of neurons of the human brain?
The current state-of-the-art perception model(vision) is about 100 million operations. Our brain is estimated to operate in 1 exaflop operation/second, which is 10 billion times faster.
How much time will it take us to build a computer that works that fast?
Assuming we double compute power every 2.5 years (According to Moore’s law) then in 10-20 years would be a fair estimation.
However, there are two big issues with the above math:
- We argued processing speeds are coming to a halt, it is unlikely current semiconductor technologies will supply the same computation increase in the coming 20 years as they did in the last 20 years.
- The assumption artificial neuron has the same power of biological neuron, there are already research should biological works very differently and at least 1000 more powerful.
But what if there is more to it than just volumes of the speed of data processing?
It is my current work assumption that intelligence is not digital quality and as of today can be found only on biological entities. The processors we have today are all digital (turing) machines, if Intelligence is not a digital quality then it’s not a matter of more transistors or faster transistors, it means our brain has other physical capabilities that allow us to have a very fast decision-making process, based on highly sparse data, capabilities that lack in our current hardware. These capabilities seem to be related to quantum mechanics and maybe the AGI leap is the Quantum computing leap. If this is the case one can expect it will take decades more to come.
Whether it’s a matter of more processing power or new technology leap the conclusion is the same: AGI is decades away.
We are left with “Narrow AI” for our cat analytics.
The question of whether AGI is possible is equivalent to the question is the human brain a Turing machine? Read further here to establish your own opinion:
- Gödel has proved in his incompleteness theorems there is no algorithm to prove all truths on natural numbers.
- Turing himself proved that the “halting problem” is not solvable (decidable) by machines, but it is by humans.
- Rice’s theorem has extended the machine’s lack of decidability to all data semantic properties of data/code, properties that describe the relationship between the input and output of a program (data labels are semantic properties).
More Data Means Better and More Expensive AI
The most basic rule of Narrow AI is more data ➜ smarter AI.
One of the latest developments in mass data AI is called GPT-3, the question answering bot that can be used for some other tasks.
GPT-3 was developed by OpenAI, with 175 billion parameters (The previous version, GPT-2 had 1.5 billion) it is doing a remarkable job with text generation tasks. Looking at GPT-3 responses to questions can easily fool one that it understands what it is saying. It doesn’t.
Yet, when you talk to GPT-3 (chat like conversation) you get the feeling that there is someone on the other side. If you spend some time with GPT-3, you will feel you talk to a combination of genius, idiot, and psychopath bot and at any given point you can’t tell which one of the 3 entities is answering.
We have just observed very important point which is a basic rule in AI:
So large amounts of data are key for smarter AI, and we’ll see why.
Another important point is that a single GPT-3 training session, a session where we update the model with the data costs $12 million. Developing an AI model often requires many training sessions, which makes it a very expensive operation.
Here is our 2nd rule of AI:
Facebook did a similar self-supervision “trick” overriding the need for human-labeled data using Instagram hashtags as labels, with 3.5 billion images, 10X the number Google’s AI uses (disclosed in August 2017).
Facebook managed to reach 84.5% accuracy on ImageNet, which is kind of like winning the Computer Vision Olympics.
Expect self-supervision to become more dominant as technology makes progress, where self-supervision will be used to represent patterns as feature vectors while human labeling will cluster these patterns into meaningful information.
You may have heard of AlphaGo, Google’s Go bot, or other superhuman game-playing bots. These bots are not considered Super AI and the network types (reinforcement learning) used to develop them are not very common in real-world problems outside of gaming. Super AI will be able to transfer knowledge from one domain to the other, AlphaGo will not be able to help us with our cat’s analytics or even be good at chess.
Narrow AI can (and likely to) achieve superhuman capabilities, under its limited scope.
The Narrow AI Architecture
In the ideal world, we would just tell an AGI machine to learn about cats (the internet likely has all the data needed already) and then ask it to answer our questions while feeding it with live camera data. AGI is not around any time soon, and just like the rest of the industry today, we’re left with narrow AI to answer our business needs.
The Narrow AI (NAI) application is taking the big questions we are asking and breaking it into many smaller questions.
A basic NAI application will typically have the following structure:
It is worth noting that the more complicated the task the more skills, sub-skills, and layers the application will have, but the core principles remain the same.
A single perception skill is like object recognition, text translation of any other task that is very clear to answer given a data example. A simple skill is typically performed by a single neural network model, while more complex skills will require several neural networks ensemble or chaining. Skills tend to answer a very well-defined question given data (Is it a cat?).
The skill development is driven by a learning process in which the AI agent (aka bot) is taught using examples that are marked by human (The teacher), These markings called labels or annotations are a core part of the process as we will see later, the bot’s(student) goal is to mimic the human (teacher) to perfection, also on examples it has never seen (a process called generalization of the model).
It is worth mentioning that neural networks are probably 99% of what is commonly referred to as AI (perception layer) today and these algorithms are part of the machine learning field called supervised learning, as these models are supervised by humans, through the labeling process.
This layer will typically use both supervised and unsupervised learning, representing a complete human worker skill set for a specific task.
Reflecting the above concepts to our cat application can be demonstrated as follows: