Session 2: Deep Learning for NLP

🎓 Course Materials

📑 Slides

📓 Notebooks

🚀 Session 2: Neural Networks, Backpropagation & RNNs

In this second session, we move beyond the baselines of Session 1 and dive into the world of neural networks. From the foundational vanilla feedforward architecture to more advanced recurrent neural networks, you’ll see how we capture sequential patterns crucial for language understanding. We’ll also explore the intricacies of training these models, including gradient descent variants and backpropagation, as well as potential pitfalls like vanishing or exploding gradients.

🎯 Learning Objectives

Understand the core mechanics of neural networks, from feedforward passes to computing gradients.
Master backpropagation to see how weight updates flow through each layer.
Explore Recurrent Neural Networks (RNNs) and see why they’re pivotal for handling sequential data such as text.
Learn about Long Short-Term Memory (LSTM) networks and how they solve the shortcomings of vanilla RNNs.
Build a text generator that can produce plausible sequences, using an RNN trained on a small dataset (Tiny Shakespeare).

📚 Topics Covered

Neural Network Essentials

Vanilla Networks: Single-layer networks, the chain rule in practice, and how we compute partial derivatives for each parameter.
Gradient Descent: A closer look at batch, mini-batch, and stochastic variants. We’ll discuss how they’re used in frameworks like PyTorch or TensorFlow.

Recurrent Neural Networks (RNNs)

Sequential Data: Why standard NNs fail to capture dependencies in text, time-series, or speech.
Vanishing/Exploding Gradients: Common training challenges in RNNs and strategies to mitigate them.
Practical RNN Implementations: Adopting RNN variants, including LSTM and GRU, for tasks like language modeling and sequence labeling.

📖 Bibliography & Recommended Reading

Karpahty A. (2016) "Yes You Should Understand Backprop" - Blog Post A blog post explaining backpropagation in detail.
Silviu P. (2016) "Written Memories: Understanding, Deriving and Extending the LSTM" - Blog Post A blog post explaining RNN and LSTM logic in detail.
Colah, C. (2015) "Understanding LSTMs" - Blog Post A blog post explaining LSTMs in detail.
Karpathy, A. (2015) "The Unreasonable Effectiveness of Recurrent Neural Networks" - Blog Post Classic blog post illustrating RNN text generation (tiny Shakespeare).
Colah, C. (2016) "Attention and Augmented Recurrent Neural Networks" - Blog Post A blog post explaining augmented RNNs in detail.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). "Learning internal representations by error propagation." Paper Presents the backpropagation algorithm in detail.
Hochreiter, S., & Schmidhuber, J. (1997). "Long short-term memory." Paper Neural Computation, 9(8). Original LSTM paper addressing the vanishing gradient problem in RNNs.
Cho et al. (2014). "Learning phrase representations using RNN encoder-decoder for statistical machine translation." Paper Introduced the GRU (Gated Recurrent Unit) as a simpler alternative to LSTM.
He et al. (2015). "Deep Residual Learning for Image Recognition." Paper Not directly NLP, but the notion of “degradation problem” is generalizable to deep networks.

💻 Practical Components

Implementing Gradient Descent: We’ll code a simple neural net from scratch (via NumPy or PyTorch) to see how forward/backward passes work.
Vanishing & Exploding Gradients: In a toy RNN, we’ll visualize how gradients can shrink or explode, and learn about gradient clipping.
Recurrent Language Model: Train an RNN (or LSTM) on a small text corpus (e.g., Tiny Shakespeare) and watch it generate new text sequences.