Session 9: LLM introduction
π Course Materials
π Slides
Download Session 9 Slides (PDF)
π Notebooks
π Session 9: Large Language Models (LLMs) Basics
In this ninth session, we dive into the world of Large Language Models (LLMs) β from the foundations of ChatGPT to advanced prompt engineering and innovative training techniques like LoRA and QLoRA.
We explore the training processes (SFT, RM, PPO), how models like GitHub Copilot and Cursor boost productivity, and advanced topics like Retrieval-Augmented Generation (RAG).
π― Learning Objectives
- Understand the key components of LLM training: Supervised Fine-Tuning (SFT), Reward Modeling (RM), and Proximal Policy Optimization (PPO).
- Learn about prompt engineering, few-shot learning, and structured output generation.
- Analyze the advantages and challenges of RAG and why hypothetical document embeddings (HyDE) can overcome limitations.
- Grasp how fine-tuning (LoRA, QLoRA) enables adapting large models on a single GPU.
- Connect LLM-powered tools like Cursor, LlamaIndex, and Vera to real-world applications.
π Topics Covered
π§ Training LLMs
- Supervised Fine-Tuning (SFT): Aligning models to conversational data.
- Reward Modeling (RM): Learning from human feedback.
- Proximal Policy Optimization (PPO): Aligning outputs with preferences.
- Direct Preference Optimization (DPO), SimPO: Evolving preference-optimization methods.
- Limitations without RLHF: Overfitting and bias risks.
π‘ Applications of LLMs
- Cursor: AI-powered code editor for efficient development.
- LlamaIndex: RAG for internal data search.
- Vera: Fact-checking for public trust.
βοΈ Maximizing LLM Potential
- Prompt Engineering: Techniques (few-shot, chain of thoughts, schema-constrained outputs).
- Sampling Parameters: Temperature, top-p sampling trade-offs.
- Retrieval-Augmented Generation (RAG): Naive vs. advanced retrieval strategies.
- Hypothetical Document Embeddings (HyDE): Solving RAGβs limitations.
π― Model Adaptation on Limited Hardware
- Training Complexity: Challenges of training a 7B parameter model.
- LoRA: Low-Rank Adaptation for single-GPU fine-tuning.
- QLoRA: Quantized LoRA for reduced memory footprint.
π UX & Human Feedback
- Copilot Example: Integrating UX for continuous learning.
- User feedback: Passive vs. active feedback collection.
π§ Key Takeaways
Component/Technique | Purpose | Benefit |
---|---|---|
SFT | Initial conversational fine-tuning | Groundwork for aligned responses |
RM | Score LLM responses via humans | Learn to rank and prefer high-quality text |
PPO/DPO/SimPO | Optimize with preferences | Aligns LLM outputs with human needs |
LoRA | Efficient fine-tuning | Train large models on a single GPU |
Prompt Engineering | Guide LLM outputs | More accurate, context-aware interactions |
RAG/HyDE | External knowledge retrieval | Boosts factuality and relevance |
π Recommended Reading
- Argilla Blog Posts β argilla.io/blog Insightful articles on DPO, ORPO, SimPO, and other advanced LLM training methods.
- Gao et al. (2022) β βPrecise Zero-Shot Dense Retrieval without Relevance Labelsβ Paper introducing Hypothetical Document Embeddings (HyDE).
- Hu et al. (2021) - "LoRA: Low-Rank Adaptation of Large Language Models" Paper introducing LoRA.
- Blog post - blog.eleuther.ai/transformer-math/ Blog post explaining the math behind LLM computation.
- Blog post - kipp.ly/transformer-inference-arithmetic/ Blog post explaining the arithmetic behind LLM inference.
π» Practical Components
- Code Examples: Fine-tuning with LoRA on single GPUs.
- Prompt Engineering Snippets: Showcasing few-shot learning and chain-of-thought examples.
- OpenAI API Usage: Python code for prompt optimization and few-shot prompting.