Post

Visualizzazione dei post da giugno, 2023

NLP Part 4 - Contextual Embedding

Immagine
 NLP Part 4 - Contextual Embedding In the last article ( https://medium.com/@umbertofontana/nlp-part-4-toxic-comments-classification-10e7167fa50b ) we used the word/sentence embedding technique combined with Logistic Regression to classify the toxicity of a text among 6 possible labels. It seems like we can conclude this series of articles since these methods are very effective and create a global word representation for a machine to understand. Well, wrong… Remember that the research to arrive at chatGPT was long and passed through many little steps. In this article, I will talk about contextual embedding with a main focus on two models: Seq2seq and ELMo. Context Matters Yes, our language (doesn’t matter which language) is way more complicated. If I search for a translation from Italian to English of the word “ campagna ” I obtain the following: Country, countryside, rural area ; Land, farmland ; Campaign, offensive ; Promotion, campaign ; Effectively, even in the translations above w

NLP Part 3 - Sentence Embeddings

Immagine
  NLP Part 3 - Sentence Embeddings In the previous 2 articles we’ve talked about general NLP ( https://medium.com/@umbertofontana/nlp-part-1-introduction-to-nlp-e686611da3da ) and how we can transform words into numbers such that also the machine can understand our vocabulary ( https://medium.com/@umbertofontana/nlp-part-2-words-representation-d0791d6da89d ). In this part, we’re going to extend the previous concept to sentence embedding. After this, we’re ready to start some practice and implement our first NLP System! From words to sentences We know now that is possible for a machine to understand the words that we send to it, but is it sufficient? Well, if it was, this chapter would end here. But let’s see the why. The problem with dealing only with words (or n-grams, we didn’t forget about you) is that the context of words is not taken into account. Let’s take the sentence “Ground Control to Major Tom” and “Major Tom to Ground Control”. Both receive the same representation in these