Skip to content

Session 4: Practical NLP - 1

๐ŸŽ“ Course Materials - Practical NLP - 1

Session 4: Text Classification Pipelines and Explainability

In this hands-on session, we walk through the evolution of text classification pipelines, from traditional approaches like TF-IDF + linear classifiers to modern deep learning models with LSTM and pretrained word embeddings like Word2Vec. The session closes with an introduction to model explainability using LIME, giving students insight into how models make decisions.

This notebook is designed as a modular blueprint that can be reused and extended for many text classification tasks.

๐Ÿ““ Notebooks


๐ŸŽฏ Learning Objectives

  1. Understand how to turn raw text into machine-readable input (TF-IDF, tokenization, embeddings).
  2. Build baseline and deep models (logistic regression, BiLSTM).
  3. Integrate pre-trained embeddings (Word2Vec) into custom pipelines.
  4. Apply explainability tools like LIME to interpret model behavior.
  5. Compare models using quantitative and qualitative evaluation (metrics & examples).

  • scikit-learn TF-IDF Documentation โ€“ Link
  • spaCy Tokenizer Docs โ€“ Link
  • Gensim Word2Vec Tutorial โ€“ Link
  • LIME GitHub Repo โ€“ Link

๐Ÿ’ป Practical Components

  • ๐Ÿงช Build a text classifier from scratch using TF-IDF + LogisticRegression.
  • ๐Ÿง  Train an LSTM with one-hot or pre-trained embeddings.
  • ๐Ÿ“ฆ Use Word2Vec embeddings from Hugging Face.
  • ๐Ÿ” Explain and debug predictions with LIME for real-world NLP workflows.
  • ๐ŸŽฏ Compare models using both metrics and example-level outputs.