Over the past few years, the field of artificial intelligence (AI) has made remarkable progress. The advancements in the capabilities of AI tools are truly astonishing. Today, we can communicate with computers using our everyday language, and machines have mastered complex tasks that were once believed to be the exclusive domain of humans. The progress in improving the quality of smart devices has surprised even veteran experts in the field.

The essence of the AI revolution

AI technological revolution can be compared to the emergence of the internet a few decades ago, but at that time technological innovations such as email, remote file access and the World Wide Web were introduced much more slowly into everyday life. The power and utility of new AI tools such as chatGPT and various image generation systems is truly fascinating, and their easy accessibility has quickly led to mass adoption.

With the new smart language systems, we can talk to articles and books, ask them all sorts of questions, seek further clarification, and do many things we could not do with texts before. We can also conduct market analysis, draft a wide range of communications, reply to emails, summarise the content of meetings and many other similar time-consuming tasks. Nor is it a barrier to new technologies if the text we want to read is only available in a language we do not understand. The smart tools make it easy to discuss its content in English, even if the source text is in Arabic, Chinese or any of the many languages already supported by the new systems.

The proliferation of languages already covered by new AI tools has made human knowledge even more accessible to people around the globe. Researchers are now working to develop language models that support all the world’s languages for which sufficient digitised resources are available as a prerequisite for the implementation of machine learning.

What scientists don’t yet understand about AI

These technological advances are accompanied by claims that even scientists developing new AI tools do not have a good understanding of why there is such a leap in the capabilities and usefulness of this type of technology right now. Researchers are, of course, very familiar with the workings of the devices and programs that power smart tools. They also understand that in addition to understanding the theory, to effectively perform machine learning of AI models, they need a lot of computing power, which is not cheap. However, it is quite another thing for experts to be able to explain how AI actually performs the tasks that we believe require some kind of a thought process.

The frequent remark that large language models such as chatGPT can only statistically predict the most likely next word in sequence ignores the very essence of the revolution we are witnessing in the field of artificial intelligence. By learning to predict letters or words in a text, the neural network can also be taught many other tasks. 

Large neural networks can extract patterns from the mass of data they learn from, allowing them to perform many complex tasks, which is a remarkable achievement. In a way, this is similar to the phenomenon where systematic training in long-distance running would prepare an athlete to be equally successful in other sports. The result is strange and perhaps surprising, but it is an important element of the recent AI revolution.

How do machines perform tasks that require thinking?

Large language models of artificial intelligence, capable of running smart services such as chatGPT, are based on large neural networks. These are huge mathematical equations where, during the learning process, parameters are gradually adjusted so that the models are then able to calculate a meaningful answer to a given question. However, as with the connections between neurons in the human brain, it is not immediately clear how the structure of the connections between neurons translates into the calculated answers to questions in artificial neural networks.

Large-scale AI models operate on a black box principle, where input data passes through the complex connections of a large neural network, which can include billions of parameters or weights, but the way these parameters integrate with each other to produce specific results is not obvious. Understanding the “black box” problem, which concerns the performance of large-scale AI language models and functioning of the human brain, is one of the great scientific puzzles of our time. This challenge is not only important for building and managing even more powerful AI models in the future, but also represents one of the fundamental questions in AI research.

The key problem is not, of course, a lack of understanding of the mathematical principles that underpin how AI models work. The central problem lies in a deeper question: how is it possible that huge mathematical equations can so efficiently perform tasks such as answering questions, generating text, translating between languages, creating images and other similar activities that, until recently, only humans could do well?

The phenomenon of generalisation in AI

A fundamental element of machine learning is the phenomenon of generalisation. It represents the basic way AI models can learn to “understand” something, not just learn it by heart. Generalisation in machine learning is the ability of a model to efficiently and correctly predict or explain new, previously unknown data that comes from the same general population as the training data. In essence, it is the capacity of the model to apply the learnt knowledge from the training set to data that it has not seen during training, which is crucial for its practical applicability.

Models can learn to perform tasks such as translating sentences from one language into another by training on a set of already translated examples. However, they can generalise their knowledge and learn to perform similar tasks on examples they have not seen before. Models not only remember patterns they have already seen, but also independently develop rules during the learning process that enable them to apply these patterns to new examples. In particular, large language models such as GPT-4 have a surprising capacity for generalisation.

When we train an AI model, we want it to learn patterns that are generally valid for the problem we are trying to solve, but we do not want it to overfit to the specifics of the training dataset. If we overfit the model to the training data, it will perform excellently on the training data, but its performance on the new data will be much worse, because it has learned the specific details of the training set rather than developing an “understanding” of the general patterns.

The invention of the artificial neuron and training algorithm

The remarkable technological revolution in artificial intelligence that we have witnessed in recent years is the result of years of research, which has recently reached an important peak. Scientists have long wondered what makes the human brain intelligent, and many years ago concluded that the ability to think is most likely related to the number of neurons and how they are organised.

That is why, in the middle of the 20th century, researchers began to investigate how the workings of the neurons in the brain could be mimicked by mathematical models. This approach led to the invention of the artificial neuron, which is nothing more than a mathematical formula that attempts to emulate the functioning of a biological neuron. Many interconnected artificial neurons then form a neural network, which is also just a mathematical equation with many parameters.

The next major step in developing artificial neural networks was the discovery of a method by which artificial neurons can be trained. The backpropagation algorithm allows neural networks to learn from data or tasks that have already been solved. Simply put, the neural network calculates its prediction of the outcome and compares it to the actual outcome of the task, then calculates how much each neuron contributed to the error. In the next step, it corrects the parameter settings of each neuron so that the overall prediction error is reduced. If the process is repeated many times, the neural network parameter settings are gradually changed in such a way that the neural network becomes better and better at predicting the correct results.

The unsupervised learning process

Supervised machine learning means that we know in advance what we want the neural network to learn. For such learning, we need a large number of solved tasks on which to train the neural network. But unsupervised machine learning is a much more interesting approach, because we train the neural network to be as good as possible at a particular task, such as predicting the next word in a text, while at the same time it learns many other tasks that we have not trained it to do.

The unsupervised learning process of neural networks can also be seen as a process of discovering hidden structures in the data. We do not instruct a machine what to learn, but by teaching it to predict the next word, we enable it to systematically “read” texts and learn their content.

In 2017, OpenAI researchers taught a neural network to predict the next letter in a collection of 82 million reviews written by customers on Amazon for different products. An important result of this machine learning was that by training the neural network to predict the next character in texts, it learned not only this skill, but also a number of other skills for which it was not directly trained.

For example, they found that one of the neurons in the neural network became particularly sensitive to the mood of a particular text. In most cases, if the neuron was active, the rating was positive, but if it was inactive, the rating was negative. They also found that by switching this neuron on and off, they could directly control the sentiment of the newly generated reviews.

The experiment showed that learning to predict letters or words in a text can also teach the neural network many other tasks over time. In a way, this is similar to the phenomenon that systematic training in long-distance running would train an athlete for other sports. The result is unusual and perhaps surprising, but it is an important element of the recent AI revolution.

Knowledge as the  by-product of guessing

Over the next few years, when much larger neural networks were trained on a very large set of texts, it turned out that although the neural network only learns to predict the next word in a sequence of words, it somehow learns to “understand” the content of those texts. Of course, this “understanding” is not the same as in the human brain, but even in an artificial neural network, structures are formed that somehow correspond to the ideas, or a very condensed form of the notation, of the mass of information contained in the texts.

If the learning process is performed correctly, the neural network automatically extracts key ideas from the dataset during the learning process. These ideas are then stored in the neural network, which is a rather compressed record given the large amount of source data. The trick is that the neural networks we use for generative models have a much smaller number of parameters than the amount of data we train them on, so the models need to discover and effectively internalise the essence of the data to be able to regenerate it. By compressing it into a shorter record, they can extract key patterns from the data that enable understanding and prediction, which is certainly an impressive achievement.

The frequent observation that large language models such as chatGPT can only statistically predict the most likely next word in a sequence misses the point of the revolution we are witnessing in the field of artificial intelligence. Learning to predict the next word should be understood as a reading process, the by-product of which is the “knowledge” of the content that the neural network is reading.

Data compression and concept generation

The essence of unsupervised learning, where we optimise a neural network for one task while it learns something else that we are actually interested in, can also be thought of as a form of data compression. The complexity of a dataset can be defined as the shortest possible instruction to reproduce that data. While the machine is learning to predict the next word in the text, it manages to significantly reduce the size of the recording of the essence of this data, by managing to store it in the parameter settings or weights of the neural network.

Through generalisation and other similar processes, large AI language models perform a kind of compression of the knowledge they learn from. They condense large amounts of available information through generalisation into a well-structured compact form that takes up much less disk space than the original data. Although it is just a matter of setting parameters in a huge mathematical equation, this process of machine learning can also be seen as a kind of formation of concepts that otherwise form the basis of thought.

Although during the learning process the neural network only tries to correctly predict the next word in the large corpus of texts we are training it with, it also creates a kind of conceptual world that remains hidden in its parameters or weights after the learning process is complete. Artificial intelligence does not learn “by memorising”, but rather generalises and organises data by developing slightly differently constructed “concepts”. In doing so, it creates structures in the hidden (latent) mathematical space of the neural network. It could also be described as a kind of virtual world of ideas, as Plato imagined long ago when he tried to understand how people think and learn.

The next frontier for artificial intelligence

Artificial intelligence currently learns from textual data that contains ideas already organised and articulated through concepts. The conceptualisation, carried out by humans, is thus pre-established. AI synthesises and structures the pre-existing conceptual frameworks found in its training corpus into the mathematical latent space of a neural network.

The next significant breakthrough for AI will be its ability to conceptually structure and thus understand information that has not yet been conceptually processed or organised. This involves deriving insights from raw, unprocessed “sensory” data. This capability is partially demonstrated by AI systems powering autonomous vehicles, which interpret the road vehicle surroundings to ensure safe navigation and decision-making in traffic. However, even in this scenario, the categorisation of data is predetermined.

The potential for AI to devise different, novel, and alternative categorisations of the world opens up a vast array of intriguing and significant philosophical questions and dilemmas. The advancement towards AI that can independently conceptualise and categorise raw data without human intervention will mark a profound shift in our understanding of cognition and the nature of intelligence itself. This evolution challenges us to reconsider the boundaries of AI capabilities and its role in expanding the horizons of human knowledge and perception.

Summary: The article provides a comprehensive overview of the latest developments in artificial intelligence (AI), framing the AI revolution as a transformative period comparable to the advent of the internet. It delves into the complexities and challenges that researchers encounter in dissecting the mechanisms through which AI executes cognitively demanding tasks. Key topics covered include the concept of generalisation within AI systems, the backpropagation algorithm, and the dynamics of unsupervised learning. The article further explores how large-scale AI models manage to “understand” content through a process of data compression and concept generation, suggesting that AI models create a conceptual world within its mathematical space. The discussion culminates in an examination of the forthcoming horizon for AI: the capability to autonomously conceptualise and categorise unprocessed data, independent of human guidance.

This article was originally published in Bančni vestnik, VOLUME 73, No. 5, MAY 2024.

Notify of

Inline Feedbacks
View all comments