The Challenge of Scaling AI Technology

by Danko Nikolic

One key indicator of a safe investment is the need to scale. After a proof of concept has been established, the technology has been shown to function properly and customers demonstrate a clear interest, all that stands between the concept and profits is scale.

For most technology investments, scaling is quite safe. Sure, there are a few hurdles to overcome, but the risk tends to be small because technology scaling is usually sublinear: To serve twice as many customers, fewer than twice the amount of resources is needed. Big data technology is designed exactly this way to make this process as painless as possible.

But when we consider scaling for AI technology, an important distinction needs to be made. We need to separate scaling users from scaling intelligence. The growth in the number of users of a proven, well performing AI is sublinear and, technologically speaking, easy to accomplish. However, scaling intelligence is not.

Scaling up?

Human brains are capable of meaningfully distinguishing huge numbers of objects. One estimate puts this number somewhere between 10e22 (as the minimum) up to 10e48 (as a maximum), for an averagely educated adult person. In AI, today’s top performers have an accuracy level of about 76% for pictures of 100 different classes of objects (dolphins, sharks, roses, bottles). That would mean that machines of today perform something like 1020 times worse than humans.

How can we rise machines up to this level? So far, nobody has found a way. No one has demonstrated a way to effectively scale AI technology in this direction. In fact, it seems that machines scale quite inefficiently when it comes to intelligence gains.

For example, take the 76% performance level mentioned above. That project required thousands of simulated neurons in a deep network distributed over 18 layers. This may not seem like much when compared to some 80 billion neurons in the human brain, but it makes you wonder what’s needed to effectively scales toward large numbers of objects.

To estimate this number, let’s look backward, toward smaller numbers of objects. When the same architecture is used to distinguish between only 10 classes of objects, the performance is much better, about 93% correct. To achieve the 76% level of performance, much simpler architectures suffice. (One can see examples hereherehere and here)

Between only two classes of objects, even simpler architectures suffice. In fact, to distinguish as accurately or better between just two classes, simple logistic regression is sufficient.

This shows us that every additional class added to the intelligence level of machine requires significant resource (memory, floating point operations, etc.) additions. Growth follows a superlinear model, meaning that doubling the number of classes requires more than double the resources.

The high cost of learning

Why is this? Because a machine learning model must learn to represent various relationships between independent classes. What the machine learns about distinguishing dolphins from roses is not useful for distinguishing dolphins from bottles, or bottles from roses. As a result, the amount of learned knowledge and processed information grows with the number of pairwise comparisons. Even with various optimizations and simplifications of features, exponential growth cannot be avoided.

In the theoretically best scenario, the growth would be linear: For each additional class, only the resources for detecting this class would need to be introduced, without affecting the performance on other classes.

But even if a linear growth in demands is achieved by some new amazing technology (exponent = 1.0), this is by no means sufficient to reach human level performance. A thousand neurons for 100 classes (10 neurons per class) would need to be scaled up to 10e23 neurons for 10e22 classes. Compare that to ~10e10neurons in the human brain.

Note that we’re estimating only the amount of resources needed to run an already trained network. We did not calculate the amount of resources needed to train the actual network. This may be an even a bigger problem because, with an increase in the number of classes, the current technology has more and more difficulties converging to a solution during training. Hence, even with infinite computational resources, we may not be able to properly train existing neural networks.

But disregarding the training problem, the implementation of an already trained deep neural net is quite difficult. Even with maximally optimized use of RAM (computer memory), the above example would imply the need to use something like 10e20 gigabytes, which is an astronomical number. If the price of memory fell to only $1 per gigabyte, that amount of memory would require more money that what exists in the world today (the entire world has today about $1 quadrillion, or $10e15). And this estimate does not even include memory for training samples, GPUs, etc.

These calculations paint a very grim picture for scaling the intelligence of our machines. But there may be a solution I have hinted at in other posts — and we can learn it from biology.

The answer is out there

Biology solves this scaling problem through an adaptive process that involves a unique type of re-learning. The process is both very rapid (less than second) and extensively intelligent in itself; the learning rules contain a mass of knowledge about the world. But, unfortunately, this rapid re-learning is something no AI built today has at its disposal.

If we do not find a way to scale the intelligence of machines sublinearly (exponent << 1) we probably cannot hope to reach AGI. And, in my view, the scaling problem will always exist unless we change our approach to AI to match what philosopher John Searle had in mind and also discussed in The Artificial Intelligence Kindergarten post.

Only when we find a sublinear solution to intelligence scaling will we have strong AI and open a path toward AGI. And in the meantime, what should we do?


About the author

Danko Nikolic is a data scientist at Teradata. Before joining Teradata at DXC Technologies and before that as a scientist and entrepreneur. For many years he led a lab for brain and mind research at the Max Planck Institute, investigating how the brain works, inventing new statistical methods and pondering how to devise a better form of artificial intelligence (AI). He inherited his family’s business genes and has started several companies and been involved in multiple startups, focused on topics ranging from civil engineering and IT to psychology and AI. He has degrees in civil engineering and psychology, a PhD in cognitive psychology and is an honorary professor of psychology at the University of Zagreb.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s