Skip to content

Session 6: Few-Shot learning & Transfer Learning

πŸŽ“ Course Materials

πŸ“‘ Slides

Download Session 6 Slides (PDF)

πŸ““ Notebooks


πŸš€ Session 6: Few-Shot Learning with BERT Models

In this sixth session, we dive into Few-Shot and Zero-Shot Learning in NLP. These techniques are designed to work in data-scarce environments, mimicking how humans can generalize from just a few examples.

We explore the remarkable generalization abilities of BERT-like models, learn how to apply zero-shot classification using simple prompting techniques, and discover how to generate synthetic data with generative models like GPT-2. We also investigate state-of-the-art techniques like SetFit that combine contrastive learning and fine-tuning β€” achieving strong results with minimal data.

🎯 Learning Objectives

  1. Understand the motivations for Zero-Shot and Few-Shot Learning in NLP.
  2. Explore how BERT and Transformer-based models naturally support these paradigms.
  3. Apply different approaches to zero/few-shot classification, including NLI, cloze prompting, and embedding similarity.
  4. Learn to generate task-specific labeled data using GPT prompting.
  5. Fine-tune Sentence Transformers with contrastive learning using SetFit.

πŸ“š Topics Covered

Few-Shot Learning Foundations

  • Data scarcity challenges in real-world NLP.
  • Human-like generalization: learning from just a few examples.
  • Why BERT-like models are ideal for few-shot learning.

Zero-Shot Classification Techniques

  • Latent Embedding Matching: Use similarity between sentence and class embeddings.
  • Natural Language Inference (NLI): Frame classification as premise-hypothesis inference.
  • Cloze Task with BERT: Convert classification to fill-in-the-blank prediction.
  • Weak Supervision with Snorkel: Labeling via noisy heuristics.

Prompt Engineering and Text Generation

  • Use GPT-2 or GPT-3 to generate balanced synthetic datasets.
  • Prompting as a tool for classification, style transfer, and data augmentation.

Advanced Few-Shot Learning

  • iPET and Pattern-Exploit Training (Schick & SchΓΌtze, 2020).
  • SetFit (Tunstall et al., 2022):
  • Few-shot learning via contrastive training of Sentence Transformers.
  • No need for full finetuning or large hardware.
  • Very fast and cost-efficient.

🧠 Key Takeaways

Approach Data Required Training Time Interpretability
Traditional Supervised High Long βœ…
Zero-Shot (NLI/Embeds) None None βœ…
Cloze Prompting None None ⚠️
GPT-based Generation None Medium ❌
SetFit (Contrastive) Very Low (8–16 examples) Very Fast βœ…

  • Brown et al. (2020): Language Models are Few-Shot Learners – Paper The GPT-3 paper showing incredible few-shot generalization.

  • SetFit: Efficient Few-Shot Learning Without Prompts - Blog A simple framework for contrastive learning of visual representations.

  • Zero-Shot Text Classification - Blog A comprehensive guide to zero-shot classification.

  • DINO:Using Big Language Models To Generate Entire Datasets From Scratch - Blog Using a LLM to generate a dataset from scratch.

  • Schick & SchΓΌtze (2020): Exploiting Cloze Questions for Few-Shot Text Classification – Paper Introduction to iPET and pattern-based classification using BERT.

  • Tunstall et al. (2022): Efficient Few-Shot Learning with Sentence Transformers – Paper SetFit, a scalable, contrastive learning approach to few-shot classification.

  • Yin et al. (2019): Benchmarking Zero-shot Text Classification – Paper Comparative analysis of zero-shot approaches including NLI and embeddings.

  • Snorkel: Weak Supervision for Training Data – Website Framework to label data using programmatic rules and heuristics.


πŸ’» Practical Components

  • Topic Modeling with BERTopic: Use embeddings and clustering to explore topic structures in reviews.
  • Zero-Shot Classification: Leverage Hugging Face pipelines with BERT/RoBERTa/DistilBERT for inference-only classification.
  • Prompting with GPT-2: Learn to generate realistic and diverse movie reviews using carefully crafted prompts.