Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md

1. Introduction

This module motivates the paradigm shift from traditional LLM post-training (treating models as static sequence generators) to agentic reinforcement learning (transforming models into autonomous decision-making agents). You'll understand how capabilities like planning, tool use, memory, and self-improvement emerge naturally when RL is applied beyond single-turn alignment in dynamic environments.

Key Takeaways

Paradigm Evolution: LLMs evolve from passive text generators to autonomous agents through RL training
Dynamic Environments: Agentic RL operates in multi-step, partially observable environments vs. single-shot responses
Emergent Capabilities: Planning, tool use, memory, and reasoning emerge from RL optimization rather than hand-crafted heuristics
Research Scope: 500+ papers surveyed, focusing on RL-empowered agentic behaviors, not traditional alignment

🛠️ Prerequisites

# Verify conceptual prerequisites
echo "Do you understand basic LLM concepts? (transformers, attention)"
echo "Are you familiar with reinforcement learning basics? (MDP, rewards)"
echo "Do you have Python + ML library experience? (torch, transformers)"

# Technical setup check
python -c "import torch, transformers, gymnasium; print('✅ Core libraries ready')"

Quick Start: Understanding the Shift

Traditional LLM (Passive Generation)

# Traditional LLM: Single-turn, deterministic response
def traditional_llm(prompt):
    """Classic LLM behavior: prompt in, text out"""
    return model.generate(prompt)

# Example usage - no memory, no planning, no tools
response = traditional_llm("What's the weather like?")
print(response)  # Generic response, no actual weather data

Agentic LLM (Active Decision-Making)

class AgenticLLM:
    """Agentic RL agent: multi-step reasoning with tools and memory"""
    def __init__(self):
        self.memory = []
        self.tools = {'weather_api': self.get_weather}
        self.planning_depth = 3
    
    def act(self, observation):
        """Multi-step agentic behavior"""
        # 1. Planning: Decompose request into steps
        plan = self.create_plan(observation)
        
        # 2. Tool Use: Access external information
        if 'weather' in observation.lower():
            location = self.extract_location(observation)
            weather_data = self.tools['weather_api'](location)
            
        # 3. Memory: Update contextual understanding
        self.memory.append({
            'observation': observation,
            'plan': plan,
            'tool_result': weather_data
        })
        
        # 4. Generate response based on actual data
        return self.synthesize_response(plan, weather_data)
    
    def get_weather(self, location):
        # Simplified weather API call
        return f"Current weather in {location}: 72°F, sunny"

# Example: Agent actually uses tools and plans
agent = AgenticLLM()
response = agent.act("What's the weather like in San Francisco?")
print(response)  # Actual weather data through tool use

The Paradigm Shift Visualization

Traditional LLM-RL (PBRFT)         Agentic RL (This Course)
┌─────────────────┐               ┌─────────────────────────┐
│  User Prompt    │               │    Environment         │
│       ↓         │               │         ↓              │
│  Single Step    │      VS       │   Multi-Step POMDP     │
│       ↓         │               │         ↓              │
│  Text Response  │               │  Action → Observation  │
│  (Terminal)     │               │     → Reward Loop      │
└─────────────────┘               └─────────────────────────┘

Static, One-shot                   Dynamic, Sequential
No Memory                         Persistent Memory
No Tools                          Tool Integration
No Planning                       Strategic Planning

Why This Shift Matters

Problem with Traditional Approach

Limited Interaction: Single prompt-response cycles
No Learning: Can't adapt from interactions
Tool Blindness: No access to external information
Memory Loss: Each interaction starts from scratch

Solution: Agentic RL Framework

# Core mathematical framework (simplified)
def agentic_rl_framework():
    """
    Traditional: P(text|prompt) - single step
    Agentic: π(a_t|s_t, h_t) in POMDP - sequential decisions
    
    Where:
    - π: Policy (learned through RL)
    - a_t: Action at time t (text + tool calls)
    - s_t: Environment state at time t
    - h_t: History/memory up to time t
    """
    pass

# This enables:
capabilities = [
    "planning",      # Sequence of actions toward goal
    "tool_use",      # External information access
    "memory",        # Context preservation across turns
    "reasoning",     # Multi-step logical inference
    "perception",    # Multimodal environment understanding
    "self_improve"   # Learning from experience
]

Research Landscape Overview

Survey Scope (500+ Papers)

✅ In Scope: RL empowering agentic capabilities in dynamic environments
❌ Out of Scope: Traditional alignment (harmlessness/helpfulness only)
❌ Out of Scope: Pure LLM performance on static benchmarks
❌ Out of Scope: Non-LLM traditional RL algorithms

Key Research Institutions

University of Oxford, Shanghai AI Laboratory, National University of Singapore
UCL, UIUC, Brown University, Imperial College London
Chinese Academy of Sciences, CUHK, Fudan University

Module Structure

This introduction sets the foundation for:

Module	Focus	Practical Outcome
2. Preliminaries	MDP/POMDP formalism	Mathematical foundation
3. Capabilities	Planning, tools, memory	Component implementation
4. Applications	Code, math, GUI agents	Domain-specific systems
5. Systems	Environments & frameworks	Infrastructure mastery
6. Challenges	Scaling, trust, safety	Research frontiers

📖 Resources

Primary Survey: Agentic RL Landscape
Paper Collection: 500+ Curated Papers
Background Reading: Understanding LLM agents and RL fundamentals
Next Module: 2. Preliminaries - MDP Foundations

Ready to transform your understanding of LLMs? Continue to the mathematical foundations in Module 2.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

1. Introduction

Key Takeaways

🛠️ Prerequisites

Quick Start: Understanding the Shift

Traditional LLM (Passive Generation)

Agentic LLM (Active Decision-Making)

The Paradigm Shift Visualization

Why This Shift Matters

Problem with Traditional Approach

Solution: Agentic RL Framework

Research Landscape Overview

Survey Scope (500+ Papers)

Key Research Institutions

Module Structure

📖 Resources

FilesExpand file tree

1_Introduction

Directory actions

More options

Directory actions

More options

Latest commit

History

1_Introduction

Folders and files

parent directory

README.md

1. Introduction

Key Takeaways

🛠️ Prerequisites

Quick Start: Understanding the Shift

Traditional LLM (Passive Generation)

Agentic LLM (Active Decision-Making)

The Paradigm Shift Visualization

Why This Shift Matters

Problem with Traditional Approach

Solution: Agentic RL Framework

Research Landscape Overview

Survey Scope (500+ Papers)

Key Research Institutions

Module Structure

📖 Resources