The FontaBlog

Post

NLP Part 3 - Sentence Embeddings

- giugno 01, 2023

NLP Part 3 - Sentence Embeddings In the previous 2 articles we’ve talked about general NLP ( https://medium.com/@umbertofontana/nlp-part-1-introduction-to-nlp-e686611da3da ) and how we can transform words into numbers such that also the machine can understand our vocabulary ( https://medium.com/@umbertofontana/nlp-part-2-words-representation-d0791d6da89d ). In this part, we’re going to extend the previous concept to sentence embedding. After this, we’re ready to start some practice and implement our first NLP System! From words to sentences We know now that is possible for a machine to understand the words that we send to it, but is it sufficient? Well, if it was, this chapter would end here. But let’s see the why. The problem with dealing only with words (or n-grams, we didn’t forget about you) is that the context of words is not taken into account. Let’s take the sentence “Ground Control to Major Tom” and “Major Tom to Ground Control”. Both receive the same representation in th...

Continua a leggere

NLP Part 2 - Words Representations

- maggio 20, 2023

NLP Part 2 - Words Representation In the previous article ( https://medium.com/@umbertofontana/nlp-part-1-introduction-to-nlp-e686611da3da ) I introduced the general meaning of NLP and the common pipeline to process text. In this section, I’m going to talk about text representation. 2.1 The Representation Problem The big problem in NLP. How can we represent text? Of course, computers cannot understand others than bits. They understand a totally different (and very difficult!) alphabet. So we need a numerical representation of the text in order for the computer to robustly process and/or generate it. Many techniques have been proposed and, according to the task, it can be useful to know them all. 2.2 Occurrence-Based Methods This is a very simple text representation on which each feature is mapped to a specific textual unit (a word, n-gram, paragraph, etc). We can build a vocabulary V of the text corpus and associate each entry in the vocabulary with a unique ID. We then represent...

Continua a leggere

Cerca nel blog

The FontaBlog

Post

NLP Part 4 - Contextual Embedding

NLP Part 3 - Sentence Embeddings

NLP Part 2 - Words Representations