Artificial Intelligence (AI) is constantly evolving and the ultimate goal is to build machines with human-level intelligence. One of the major challenges in achieving this goal is to enable AI models to use common sense. Self-Supervised Learning (SSL) is a method that can help in building background knowledge on which common sense is built. In this article, we will explore what SSL is, how it works and its applications in Natural Language Processing (NLP).
Can AI Models be Common Sense Enabled?
Building machines with human-level intelligence is the ultimate goal of Artificial Intelligence (AI). But how close we are to reach this goal? To understand this, let’s first define human intelligence. For example, when a child is shown a few pictures of a tiger, they can quickly identify one in the real world.
Similarly, after just 10-20 hours of practice, an adult can learn to drive with minimal supervision. This is all possible because of something called common sense. Humans rely on their previous knowledge of how the world works when learning new skills.
But, when it comes to machine learning, even if a model is trained with thousands of images of tigers, there’s still a chance it might misclassify a tiger sitting in a tree. Self-driving cars require thousands of hours of training data from human drivers. Therefore, currently, AI models are not based on common-sense learning as perceived by humans.
The question is, can we make AI use common sense? The answer may lie in something called “self-supervised learning” (SSL).
What is Self-Supervised Learning?
Self-Supervised Learning (SSL) can be considered as a way to give AI a sense of common sense. It enables the development of background knowledge which forms the foundation of common sense.
In contrast to supervised learning, where the model is trained with labeled data, and unsupervised learning, where the model is trained with no labeled data, SSL starts training with no labels at all.
Through the training process, the model generates its own data labels which are used in the subsequent iterations. It’s a way to give AI a sense of common sense and make it more human-like.
Like any other supervised learning model, the model creates data labels as the training proceeds and uses them in later rounds.
Self-Supervised Learning (SSL) has achieved remarkable results in the field of Natural Language Processing (NLP). By training models such as BERT, RoBERTa, and XLM-R on vast amounts of unlabelled data, we have been able to apply these models to various NLP tasks.
Additionally, systems that have been pre-trained using the SSL approach have demonstrated significantly better performance as compared to models trained with supervised learning.
How Does SSL Work?
So, how exactly does SSL work? Essentially, it utilizes the inherent structure within the data to extract supervisory signals. In the context of NLP training, SSL attempts to predict a missing part of the input based on the previously observed data.
For instance, it may try to predict a hidden word within a sentence using the context of the remaining words.
In the pre-training stage of self-supervised learning, we present the model with a brief text that has some words obscured. To forecast the terms that are missing, we train the model. In this process, system will learns to understand the meaning of the text to make it sensible in context.
Take the following two statements as an illustration:
- A tiger chases a deer in the jungle.
- A cat chases a mouse in the kitchen.
If we hide two words in each of these statements, they would look like this:
- A chases in the jungle.
- A chases in the kitchen.
When you ask an SSL-trained model to fill in the two words, the model will most likely understand the sentence context and will not say, “Like A cat chases the mouse in the jungle”. Instead of in the jungle, we discover cats and mice in the kitchen. Thus, model is trained to understand the syntactic role of words & meaning of the entire text.
SSL Training Methods
The four techniques we employ for SSL training are as follows:
- Contrastive learning
- Generative pre-training
- Predicting future frames
SSL is an exciting field that has shown great progress in recent years. It is an approximate form