Session 4: Practical NLP - 1
๐ Course Materials - Practical NLP - 1
Session 4: Text Classification Pipelines and Explainability
In this hands-on session, we walk through the evolution of text classification pipelines, from traditional approaches like TF-IDF + linear classifiers to modern deep learning models with LSTM and pretrained word embeddings like Word2Vec. The session closes with an introduction to model explainability using LIME, giving students insight into how models make decisions.
This notebook is designed as a modular blueprint that can be reused and extended for many text classification tasks.
๐ Notebooks
๐ฏ Learning Objectives
- Understand how to turn raw text into machine-readable input (TF-IDF, tokenization, embeddings).
- Build baseline and deep models (logistic regression, BiLSTM).
- Integrate pre-trained embeddings (Word2Vec) into custom pipelines.
- Apply explainability tools like LIME to interpret model behavior.
- Compare models using quantitative and qualitative evaluation (metrics & examples).
๐ Bibliography & Recommended Reading
- scikit-learn TF-IDF Documentation โ Link
- spaCy Tokenizer Docs โ Link
- Gensim Word2Vec Tutorial โ Link
- LIME GitHub Repo โ Link
๐ป Practical Components
- ๐งช Build a text classifier from scratch using
TF-IDF + LogisticRegression
. - ๐ง Train an LSTM with one-hot or pre-trained embeddings.
- ๐ฆ Use Word2Vec embeddings from Hugging Face.
- ๐ Explain and debug predictions with LIME for real-world NLP workflows.
- ๐ฏ Compare models using both metrics and example-level outputs.