Advanced NLP Classes
Welcome to my Advanced NLP Classes website. Here, you'll find all the information you need to follow the course, including lecture notes, slides, resources, and home assignments. I designed this course to guide you through both traditional NLP methods and modern deep learning approaches, ensuring you gain a well-rounded understanding of how to process and analyze natural language data.
Below, I've provided a short overview of each main section of the course. Even if you're new to NLP or machine learning, don't worryโthis course begins with foundational concepts and gradually works toward more advanced topics.
๐จโ๐ซ Lecturer
Arnault Gombert
Iโm your lecturer for this course, and Iโll be here to guide you through each topic step by step. Feel free to reach out if you have any questions or need additional support. My goal is to help you not only understand the theory but also gain hands-on practice in building NLP solutions.
๐ Course Materials
You can access all the key materials here on this site. In particular, youโll find:
- Slides for each lecture session, which you can use as a reference or to review later.
- Notebooks that contain hands-on exercises and examples.
- Resources linking to external articles, videos, or blogs that can expand your learning.
- Home Assignments and instructions for the Final Project.
Iโll keep this repository updated with the latest materials, so please check back regularly.
๐ฑ GitHub repository
You can find the GitHub repository here with all the notebooks and resources. If you find any bug or have any suggestion, feel free to open an issue.
โน๏ธ About This Course
This course navigates the evolution of Natural Language Processing (NLP) from foundational techniques to advanced concepts like Large Language Models and ChatGPT. It begins with core principles such as TF-IDF and word embeddings, advancing through deep learning innovations like LSTM and BERT.
The course is structured into three main parts:
- Good old fashioned NLP (Sessions 1-4)
- Almost part of good old fashioned NLP (Sessions 5-8)
- LLMs, Agents & Others (Sessions 9 & 10)
๐ฐ Getting Started
- All chapter slides will be available in the corresponding chapter. Here is the link to the first chapter.
- Check the Notebooks section for practical exercises and assignments.
- Explore the Resources section for recommended readings and materials.
โ Pre-requisites
To get the most out of this course, you should have a good understanding of:
- Python programming (You can check the Python 1o1 notebooks starting here)
- Econometrics
- Machine learning fundamentals
๐ฏ Overview and objectives
The class is quite complex, maybe one of the densest class you'll have this year as it covers a lot of topics in a short time. We will study general concepts of NLP (Natural Language Processing) sometimes assuming you are already familiar with Machine Learning concepts. If you have general question about ML don't hesitate to ask me during the class or by email. One very good book to learn Machine Learning concepts is Elements of Statistical Learning.
I encourage you to read the notebooks and try to run them to understand the main concepts. Even if we focus on NLP techniques, it also includes general ML concepts such as how to improve your model, how to evaluate it, how to deal with overfitting, etc.
Text is complex data which is available in abundance on the web, inside firms and public organizations. This course navigates the evolution of NLP from foundational techniques to advanced concepts like Large Language Models and ChatGPT.
It begins with core principles such as TF-IDF and word embeddings, advancing through deep learning innovations like LSTM and BERT. It emphasizes practical application, allowing you to build and evaluate NLP pipelines.
Key topics include transformers, few-shot and transfer learning, and ethical considerations in NLP. The course culminates in exploring cutting-edge developments, offering hands-on experience with modern NLP challenges. Its goal is to equip students with the skills to analyze and apply NLP technologies effectively and ethically in real-world scenarios.
๐ Course outline
1๏ธโฃ Part 1: Good old fashioned NLP (Sessions 1-4)
๐น Baselines and Sparse representations (Session 1)
- Baseline & Evaluations
- TF-IDF and improvements
๐น 2015 Deep learning (Session 2)
- Backpropagation in Neural Network
- LSTM, attention processes & Language Models
๐น Word Embeddings (Sessions 3)
- Static word embedding (Word2Vec, GloVe, FastTextโฆ)
- Contextual embeddings (ELMo, BERTโฆ)
๐น Practical Session (Session 4)
- Baseline pipeline & Metrics evaluation
- LSTM-pipeline
- Word embedding add-ons
- Training our own embeddings
2๏ธโฃ Part 2: Almost part of good old fashioned NLP (Sessions 5-8)
๐น Transformers & BERT (Session 5)
- Transformer intuition: the architecture, the self-attention layers, comparisons with recurrent neural networks
- BERT architecture: the revolution of Large Language Models
๐น Few shot learning, Transfer learning (Session 6)
- Language Models are few shot learners: fine-tuning BERT architecture to transfer learning on downstream task
- Leveraging existing knowledge: using prompts to generate labels
๐น Injustice & biases in NLP: detecting and mitigating (Session 7)
- Are Large Language Models stochastic parrots? Are Large Language Models useful for everyone?
- Detect and mitigate biases of Large Language Models
๐น Practical Session (Session 8)
- Fine-tuning a BERT model
- How much data to get the best results?
- Low ressource? No problem
- Detecting biases
3๏ธโฃ Part 3: LLMs, Agents & Others (Sessions 9 & 10)
๐น Prompt engineering & Fine-tuning
- Zero shot learning, Chain of thoughts and format the outputs
- Fine-tuning
๐น Hallucinations & Introduction to Agents
- Detect hallucinations
- Other limitations
- Introduction to Agentic framework
๐ Required activities
Students will be required to conduct a couple of independent homeworks. This will include fine-tuning their own NLP model, assessing some models on some benchmark and proposing improvement.
Students will form teams of up to 4 students to develop their own text NLP project in which they will be asked to transfer the knowledge they have gained in class to solve a specific NLP problem.
๐ Evaluation
- Class Participation: 10%
- Homework: 20%
- Term project and presentation: 70%