Session 10: Agents
π Course Materials
π Slides
Download Session 10 Slides (PDF)
π Notebooks
π Session 10: Hallucinations and Agents in Large Language Models
In this session, we explore the challenges of hallucinations in LLMs and how to address them. We dive into concepts such as function calling and multi-LLM evaluation as solutions for reducing errors. Finally, we introduce agent-based frameworks (e.g., ReAct) as a next-level strategy for reasoning, planning, and tool use.
We connect theory to practice with real-world examples and Python code for building reliable, dynamic LLM agents.
π― Learning Objectives
- Understand the nature of hallucinations and outdated knowledge in LLMs.
- Explore mitigation strategies: prompt engineering, retrieval-augmented generation (RAG), function calling, and multi-LLM approaches.
- Learn how LLMs as judges can validate or compare outputs, increasing reliability.
- Discover how agents (ReAct framework) enable iterative planning, reasoning, and tool use.
- Identify failure modes (planning and tool execution errors) in agent-based systems.
π Topics Covered
π Hallucinations and Errors
- Intrinsic vs. Extrinsic Causes: From data mismatch to model limitations.
- Examples: Factual inaccuracies, outdated knowledge, and misinformation.
- Consequences: Misinformation, trust erosion, and practical failures.
βοΈ Mitigation Techniques
- Prompt Engineering: Crafting effective prompts to reduce ambiguity.
- RAG: Grounding responses with external data.
- Function Calling: Using APIs and tools for accurate, real-time answers.
- Multi-LLM Evaluation: Using LLMs as judges for quality control.
π€ Agents and Advanced Use Cases
- Agent Limitations: Planning, tool execution, and efficiency challenges.
- ReAct Framework: Combining reasoning and acting for iterative solutions.
- ReAct Steps: Think β Act β Observe β Repeat.
- Implementation: LangChain/LlamaIndex agents in Python for multi-step problem solving.
π§ Key Takeaways
Concept/Technique | Purpose | Benefit |
---|---|---|
Hallucination Analysis | Identify sources of errors | Better LLM reliability and trustworthiness |
Prompt Engineering | Clear instructions for LLM | More precise and accurate outputs |
RAG & Function Calling | External knowledge integration | Reduces hallucinations and outdated answers |
LLM-as-a-Judge | Output validation and ranking | Automated quality assurance |
Agent Frameworks (ReAct) | Iterative tool-based reasoning | Handle complex, multi-step tasks effectively |
π Recommended Reading
-
Huyen Chip β βAgentsβ A comprehensive guide to agents, including reflection and error correction.
-
Zheng et al. (2023) β βJudging LLM-as-a-Judgeβ Paper exploring how LLMs can reliably score and compare generated outputs.
-
Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs β βOptimizing Instructions and Demonstrations for Multi-Stage Language Model Programsβ A paper exploring how to optimize instructions and demonstrations for multi-stage language model programs.
-
LangChain Docs β python.langchain.com Comprehensive toolkit for building LLM agents with tool integration.
-
OpenAI Function Calling Examples β openai.com/blog/function-calling How to integrate LLMs with APIs for grounded answers.
-
ReAct Framework β ReAct: Synergizing Reasoning and Acting in Language Models Foundation paper introducing the ReAct agent approach.
π» Practical Components
- LLM Tools Integration: Function calling examples for real-time data.
- LLM-as-a-Judge: Code snippets for output ranking and quality control.
- ReAct Agent Implementation: LangChain-based examples with external tool usage.