Post

Visualizzazione dei post da maggio, 2023

NLP Part 2 - Words Representations

Immagine
  NLP Part 2 - Words Representation In the previous article ( https://medium.com/@umbertofontana/nlp-part-1-introduction-to-nlp-e686611da3da ) I introduced the general meaning of NLP and the common pipeline to process text. In this section, I’m going to talk about text representation. 2.1 The Representation Problem The big problem in NLP. How can we represent text? Of course, computers cannot understand others than bits. They understand a totally different (and very difficult!) alphabet. So we need a numerical representation of the text in order for the computer to robustly process and/or generate it. Many techniques have been proposed and, according to the task, it can be useful to know them all. 2.2 Occurrence-Based Methods This is a very simple text representation on which each feature is mapped to a specific textual unit (a word, n-gram, paragraph, etc). We can build a vocabulary V of the text corpus and associate each entry in the vocabulary with a unique ID. We then represent eac

NLP Part 1 — Introduction to NLP

Immagine
  NLP Part 1 — Introduction to NLP This is the first article I decided to write concerning the sub-subject of Machine Learning, known as Natural Language Processing.  I will summarize the main topics and challenges of this fascinating field and try to explain them in the most practical way possible. 1.1 What is Natural Language Processing? No good book about a topic cannot start without introducing the subject. It is the most boring part usually but is the basement of which without we cannot build anything, so I will be very brief. I asked ChatGPT what is Natural Language Processing, and the answer has been: “ Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on enabling computers to understand, interpret, and generate human language in a way that is similar to how humans do ”. The chatbot has been very clear I think, not surprisingly since it is one of the greatest production of Machine Learning/NLP of the last years. But it is not the only tool tha