NLP Part 2 - Words Representations

NLP Part 2 - Words Representation In the previous article ( https://medium.com/@umbertofontana/nlp-part-1-introduction-to-nlp-e686611da3da ) I introduced the general meaning of NLP and the common pipeline to process text. In this section, I’m going to talk about text representation. 2.1 The Representation Problem The big problem in NLP. How can we represent text? Of course, computers cannot understand others than bits. They understand a totally different (and very difficult!) alphabet. So we need a numerical representation of the text in order for the computer to robustly process and/or generate it. Many techniques have been proposed and, according to the task, it can be useful to know them all. 2.2 Occurrence-Based Methods This is a very simple text representation on which each feature is mapped to a specific textual unit (a word, n-gram, paragraph, etc). We can build a vocabulary V of the text corpus and associate each entry in the vocabulary with a unique ID. We then represent...