Session 6: Few-Shot learning & Transfer Learning

🎓 Course Materials

📑 Slides

📓 Notebooks

🚀 Session 6: Few-Shot Learning with BERT Models

In this sixth session, we dive into Few-Shot and Zero-Shot Learning in NLP. These techniques are designed to work in data-scarce environments, mimicking how humans can generalize from just a few examples.

We explore the remarkable generalization abilities of BERT-like models, learn how to apply zero-shot classification using simple prompting techniques, and discover how to generate synthetic data with generative models like GPT-2. We also investigate state-of-the-art techniques like SetFit that combine contrastive learning and fine-tuning — achieving strong results with minimal data.

🎯 Learning Objectives

Understand the motivations for Zero-Shot and Few-Shot Learning in NLP.
Explore how BERT and Transformer-based models naturally support these paradigms.
Apply different approaches to zero/few-shot classification, including NLI, cloze prompting, and embedding similarity.
Learn to generate task-specific labeled data using GPT prompting.
Fine-tune Sentence Transformers with contrastive learning using SetFit.

📚 Topics Covered

Few-Shot Learning Foundations

Data scarcity challenges in real-world NLP.
Human-like generalization: learning from just a few examples.
Why BERT-like models are ideal for few-shot learning.

Zero-Shot Classification Techniques

Latent Embedding Matching: Use similarity between sentence and class embeddings.
Natural Language Inference (NLI): Frame classification as premise-hypothesis inference.
Cloze Task with BERT: Convert classification to fill-in-the-blank prediction.
Weak Supervision with Snorkel: Labeling via noisy heuristics.

Prompt Engineering and Text Generation

Use GPT-2 or GPT-3 to generate balanced synthetic datasets.
Prompting as a tool for classification, style transfer, and data augmentation.

Advanced Few-Shot Learning

iPET and Pattern-Exploit Training (Schick & Schütze, 2020).
SetFit (Tunstall et al., 2022):
Few-shot learning via contrastive training of Sentence Transformers.
No need for full finetuning or large hardware.
Very fast and cost-efficient.

🧠 Key Takeaways

Approach	Data Required	Training Time	Interpretability
Traditional Supervised	High	Long	✅
Zero-Shot (NLI/Embeds)	None	None	✅
Cloze Prompting	None	None	⚠️
GPT-based Generation	None	Medium	❌
SetFit (Contrastive)	Very Low (8–16 examples)	Very Fast	✅

📖 Bibliography & Recommended Reading

Brown et al. (2020): Language Models are Few-Shot Learners – Paper The GPT-3 paper showing incredible few-shot generalization.
SetFit: Efficient Few-Shot Learning Without Prompts - Blog A simple framework for contrastive learning of visual representations.
Zero-Shot Text Classification - Blog A comprehensive guide to zero-shot classification.
DINO:Using Big Language Models To Generate Entire Datasets From Scratch - Blog Using a LLM to generate a dataset from scratch.
Schick & Schütze (2020): Exploiting Cloze Questions for Few-Shot Text Classification – Paper Introduction to iPET and pattern-based classification using BERT.
Tunstall et al. (2022): Efficient Few-Shot Learning with Sentence Transformers – Paper SetFit, a scalable, contrastive learning approach to few-shot classification.
Yin et al. (2019): Benchmarking Zero-shot Text Classification – Paper Comparative analysis of zero-shot approaches including NLI and embeddings.
Snorkel: Weak Supervision for Training Data – Website Framework to label data using programmatic rules and heuristics.

💻 Practical Components

Topic Modeling with BERTopic: Use embeddings and clustering to explore topic structures in reviews.
Zero-Shot Classification: Leverage Hugging Face pipelines with BERT/RoBERTa/DistilBERT for inference-only classification.
Prompting with GPT-2: Learn to generate realistic and diverse movie reviews using carefully crafted prompts.