NLP Part 3 - Sentence Embeddings

In the previous 2 articles we’ve talked about general NLP (https://medium.com/@umbertofontana/nlp-part-1-introduction-to-nlp-e686611da3da) and how we can transform words into numbers such that also the machine can understand our vocabulary (https://medium.com/@umbertofontana/nlp-part-2-words-representation-d0791d6da89d). In this part, we’re going to extend the previous concept to sentence embedding. After this, we’re ready to start some practice and implement our first NLP System!

From words to sentences

We know now that is possible for a machine to understand the words that we send to it, but is it sufficient? Well, if it was, this chapter would end here. But let’s see the why. The problem with dealing only with words (or n-grams, we didn’t forget about you) is that the context of words is not taken into account. Let’s take the sentence “Ground Control to Major Tom” and “Major Tom to Ground Control”. Both receive the same representation in these approaches but with different meanings (in one case ground control wants to notify the start of the countdown and in the other case Major Tom wants ground control to know that he’s stepping through the door).

A first simple sentence representation possibility is passing through the arithmetic average of the word vector representations of the document and summarizing it into a single vector in the same embedding space. Naturally, the problem is that the network would ignore sentence-level word relations and we would return to the Space Oddity problem above.

Doc2Vec

Yes, inventive in name-calling is not the brightest, but it fits. It was the first attempt to generalize Word2Vec to deal with word sequences and is based on a paragraph vector model. The architectures proposed are of two types, very similar to the CBOW and SkipGram architectures of Word2Vec, and they are called distributed memory (DM) and distributed bag of words (DBOW).

The extension is very easy: each paragraph is mapped to a unique vector (represented by a column in a matrix D) and every word is also mapped to a unique vector (represented by a column in matrix W). With more lovely terms we can say that the paragraph token can be thought of as another word of input. In the CBOW architecture, instead, only the paragraph vector is given as input, and word vectors are averaged or concatenated to predict the next word in a context.

Skip-thought vectors

This model is structured as an encoder-decoder model. The encoder is used to map sentences in a vector, while the decoder (actually we have two decoders), from the output of the encoder, tries to generate the previous and the next sentence. The idea is that sentences that share semantic and syntactic properties are thus mapped to similar vector representations. Both the encoder and decoder are implemented with Recurrent Networks. The decoder should condition the encoder. When the decoder predicts the previous and the next sentence, it will give feedback to the decoder if it gave sufficient information on the task (the data you encoder gave me are sufficient to understand the context or not?!). In mathematical terms, The training of the decoders focuses on minimizing the reconstruction error of the preceding and subsequent sentences of an embedded sentence. This error in reconstruction is then propagated back to the encoder, motivating it to include valuable information about the current sentence.

InferSent

This model has been proposed by Facebook researchers and its goal is to learn universal representations of sentences using the supervised data of the Stanford Natural Language Inference (SNLI) datasets, which contains 570k human-generated English sentence pairs manually labeled as entailment, contradiction, or neutral. The main idea is to use a Natural Language Inference (NLI) trained model to learn universal sentence representations that capture universally useful features. The NLI task consists in determining the inference relation between two short, ordered, texts with the three labels entailment, contradiction, or neutral.

By leveraging this task, the model can learn rich representations that capture the semantic relationships between sentences. The trained InferSent model can then be used to generate sentence embeddings for unseen sentences.

Stay tuned, follow for more content, and see you on the other side!

Cerca nel blog

The FontaBlog