📓 NLP Course Notebooks
This directory contains all the Jupyter notebooks for the Advanced NLP Classes. These notebooks provide hands-on experience with the concepts covered in the lectures.
⚙️ Setup Instructions
🔧 Prerequisites
- Python 3.11 or higher
- uv (for dependency management)
📦 Installation
We use uv to manage dependencies. Follow these steps to set up your environment:
1️⃣ Install uv
macOS / Linux:
Windows:
2️⃣ Clone the repository and install dependencies
3️⃣ Install additional dependencies for notebooks
4️⃣ Launch Jupyter Notebook
Navigate to the notebooks directory to access all the notebooks.
🛠️ Troubleshooting
If you encounter issues with the installation:
- macOS: You might need to install Xcode command-line tools:
xcode-select --install - Ubuntu: Ensure you have build essentials:
sudo apt-get install build-essential - Windows: Make sure you have the Microsoft C++ Build Tools installed
📖 Table of Contents
🐍 Python Fundamentals (Session 1)
These notebooks cover the essential Python skills needed for NLP:
- Python Types: Understanding Python's type system, from basic to advanced types
- Python Classes: Object-oriented programming in Python
- Python Dataframes: Working with pandas for data manipulation
- Python NumPy: Numerical computing with NumPy
- Python scikit-learn: Introduction to machine learning with scikit-learn
📝 NLP Techniques (Session 1)
- Baseline with regexes and spaCy: Implementing simple but effective baseline approaches
- TF-IDF: how to judge its quality?: Understanding and implementing TF-IDF
- BM25: a better TF-IDF, judge through different metrics: Advanced information retrieval techniques
📝 Chapter 2: Neural Networks, Backpropagation & RNNs
- Intro to Neural Nets & Backprop (NumPy): Implementing neural networks with NumPy
- Simple RNN for Text Generation (Tiny Shakespeare): Generating text with a simple RNN
- LSTM for Sequence Classification: Building an LSTM for sequence classification
📝 Chapter 3: Word Embeddings
- Word2Vec from Scratch - with negative sampling: Implementing Word2Vec from scratch with negative sampling
- Embedding Evaluation: Intrinsic and Extrinsic: Evaluating word embeddings using both intrinsic and extrinsic metrics
- Classification with Embeddings: Using embeddings for classification tasks
📝 Chapter 5: Transformers & BERT
- BERT with Hugging Face: Loading and using pre-trained BERT models via the Hugging Face Transformers library
- Attention Visualization: Inspecting and visualizing self-attention patterns inside transformer layers
📝 Chapter 6: Few-shot & Transfer Learning
- Topic Modeling with BERTopic: Discovering topics in text using transformer embeddings and BERTopic
- Zero-Shot Classification: Classifying text without labeled data using NLI-based zero-shot models
- Text Generation with GPT: Generating text and labels with GPT-style autoregressive models
📝 Chapter 7: Bias Detection & Mitigation
- Gender Bias Detection: Detecting gender biases in language models and embeddings
- Cross-Language Evaluation: Evaluating multilingual models and their behavior across languages
- Reducing a BERT Model: Distilling and compressing BERT for faster, lighter inference
📝 Chapter 9: Prompt Engineering & RAG
- Prompt Engineering: Designing zero-shot, few-shot and chain-of-thought prompts for LLMs
- Retrieval-Augmented Generation (RAG): Building a RAG pipeline that grounds LLM answers in retrieved documents
📝 Chapter 10: LLMs, Tools & Agents
- LLM with Tools: Equipping an LLM with external tools and function calling
- LLM as a Judge: Using an LLM to evaluate the quality of model outputs
- ReAct Framework: Implementing the ReAct loop combining reasoning steps and tool actions
🤝 Contributing
If you find errors or have suggestions for improving these notebooks, please open an issue or submit a pull request.
📄 License
These notebooks are provided for educational purposes as part of the Advanced NLP Classes at Barcelona School of Economics.