National Institute of Technology Rourkela

राष्ट्रीय प्रौद्योगिकी संस्थान राउरकेला

ଜାତୀୟ ପ୍ରଯୁକ୍ତି ପ୍ରତିଷ୍ଠାନ ରାଉରକେଲା

An Institute of National Importance
NIT Rourkela Inside Page Banner

Syllabus

Course Details

Subject {L-T-P / C} : EC6802 : Reinforcement Learning { 3-0-0 / 3}

Subject Nature : Theory

Coordinator : Upendra Kumar Sahoo

Syllabus

Module 1 :

Markov Decision Process: MDP MODEL, State Markov Process, State Transition Probabilities, Expected One-Step Rewards, Discounted Rewards, Cumulative Reward, Selecting a Policy, POLICY EVALUATION, State Value Function, State–Action Value Function, Fixed-Point Iteration, Expected K-Step Rewards, LINEAR FUNCTION APPROXIMATION, Projected Bellman Error.

Module 2 :

Value and Policy Iterations: Value Iteration, Policy evaluation, Optimal Behavior, Greedy Policy, Bellman Optimality Condition, Value Iteration Algorithm, Principle of Optimality, POLICY ITERATION, Policy Improvement, PARTIALLY OBSERVABLE MDP

Module 3 :

Monte Carlo methods: Monte-carlo prediction, Monte-carlo estimation of action values, Monte carlo control, Off policy prediction via importance sampling, Incremental implementation, Temporal difference learning: TD(0) ALGORITHM , LOOK-AHEAD TD ALGORITHM TD(?) ALGORITHM, Forward View of TD() Backward View of TD(?), Offline Implementations Are Equivalent, TRUE ONLINE TD(?) ALGORITHM, OFF-POLICY LEARNING

Module 4 :

Q-Learning:
Sarsa(0) Algorithm, Look-Ahead Sarsa Algorithm, Sarsa(?) Algorithm, Off-Policy Learning, Optimal Policy Extraction, Q-Learning Algorithm, Exploration Versus Exploitation, Optimistic Initialization, ?-Greedy Exploration, Upper Confidence Bound, Q-Learning With Replay Buffer, Double Q-Learning

Module 5 :

Value Function Approximation: STOCHASTIC GRADIENT TD-LEARNING, TD(0) Implementation, TD(?) Implementation, True Online TD(?) Implementation, Least-Squares TD-Learning, Projected Bellman Learning, Selecting the Weighting Matrix D, Equivalent Representation for Bellman Error, Gradient Correction Algorithm (TDC), GTD2 Algorithm, SARSA METHODS, Feature Representation, SARSA(0) Implementation, SARSA(?) Implementation, Least-Squares SARSA Learning, DEEP Q-LEARNING, Deep Learning, Training Algorithm.

Module 6 :

Policy Gradient Methods: POLICY MODEL, Finite-Difference Method, Score Function, Objective Functions, Discounted Reward, Expected Discounted Reward, Expected Immediate Reward, Average Reward, Centered Poisson Equation, POLICY GRADIENT THEOREM, Actor–Critic Algorithms, REINFORCE Algorithm, Standard Gradient Policies, Natural Gradient Policy

Course Objective

1 .

Understand the fundamentals of reinforcement learning (RL), including agents, environments, rewards, policies, and value functions.

2 .

Model decision-making problems using Markov Decision Processes (MDPs) and analyze state transitions and reward structures.

3 .

Apply dynamic programming techniques such as policy iteration and value iteration to solve finite MDPs.

4 .

Implement model-free RL algorithms, including Monte Carlo methods, Temporal Difference (TD) learning and Q-learning.

5 .

Develop and analyze advanced RL methods such as Q-learning, SARSA, and policy gradient techniques.

Course Outcome

1 .

Explain the fundamental concepts of Reinforcement Learning including agents, environments, rewards, policies, value functions, and the exploration–exploitation trade-off.

2 .

Formulate sequential decision-making problems using Markov Decision Processes (MDPs) and compute state-value and action-value functions

3 .

Implement Dynamic Programming methods such as Policy Iteration and Value Iteration for solving finite MDPs

4 .

Apply model-free learning techniques, including Monte Carlo, Temporal Difference (TD) methods and Q-learning, to estimate value functions.

5 .

Develop and compare control algorithms such as SARSA and Q-learning for optimal policy learning.

6 .

Design and evaluate reinforcement. Learning solutions for real-world applications such as robotics, games, and autonomous systems.

Essential Reading

1 .

Richard S. Sutton and Andrew G. Barto, Reinforcement learning: An introduction, MIT press,2018 , Second edition

2 .

Ali H Sayed, Inference and learning from data: Inference" Volume 2,, Cambridge University Press

Supplementary Reading

1 .

Ivan Gridin, Practical Deep Reinforcement Learning with Python, BPB Publications , August 2022

2 .

Laura Graesser, Deep Reinforcement Learning in Python: A Hands-On Introduction, Addison-Wesley , 2020

Journal and Conferences

1 .