📓 NLP Course Notebooks
This directory contains all the Jupyter notebooks for the Advanced NLP Classes. These notebooks provide hands-on experience with the concepts covered in the lectures.
⚙️ Setup Instructions
🔧 Prerequisites
- Python 3.11 or higher
- Poetry (for dependency management)
📦 Installation
We use Poetry to manage dependencies. Follow these steps to set up your environment:
1️⃣ Install Poetry
macOS / Linux:
Windows:
2️⃣ Clone the repository and install dependencies
3️⃣ Install additional dependencies for notebooks
poetry add pandas numpy matplotlib scikit-learn spacy jupyter
poetry run python -m spacy download en_core_web_sm
4️⃣ Launch Jupyter Notebook
Navigate to the notebooks
directory to access all the notebooks.
🛠️ Troubleshooting
If you encounter issues with the installation:
- macOS: You might need to install Xcode command-line tools:
xcode-select --install
- Ubuntu: Ensure you have build essentials:
sudo apt-get install build-essential
- Windows: Make sure you have the Microsoft C++ Build Tools installed
📖 Table of Contents
🐍 Python Fundamentals (Session 1)
These notebooks cover the essential Python skills needed for NLP:
- Python Types: Understanding Python's type system, from basic to advanced types
- Python Classes: Object-oriented programming in Python
- Python Dataframes: Working with pandas for data manipulation
- Python NumPy: Numerical computing with NumPy
- Python scikit-learn: Introduction to machine learning with scikit-learn
📝 NLP Techniques (Session 1)
- Baseline with regexes and spaCy: Implementing simple but effective baseline approaches
- TF-IDF: how to judge its quality?: Understanding and implementing TF-IDF
- BM25: a better TF-IDF, judge through different metrics: Advanced information retrieval techniques
📝 Chapter 2: Neural Networks, Backpropagation & RNNs
- Intro to Neural Nets & Backprop (NumPy): Implementing neural networks with NumPy
- Simple RNN for Text Generation (Tiny Shakespeare): Generating text with a simple RNN
- LSTM for Sequence Classification: Building an LSTM for sequence classification
📝 Chapter 3: Word Embeddings
- Word2Vec from Scratch - with negative sampling: Implementing Word2Vec from scratch with negative sampling
- Embedding Evaluation: Intrinsic and Extrinsic: Evaluating word embeddings using both intrinsic and extrinsic metrics
- Classification with Embeddings: Using embeddings for classification tasks
🤝 Contributing
If you find errors or have suggestions for improving these notebooks, please open an issue or submit a pull request.
📄 License
These notebooks are provided for educational purposes as part of the Advanced NLP Classes at Barcelona School of Economics.