ST455 (Reinforcement Learning)

Contact

Name: Domenico Mergoni
Email: d.mergoni -at- lse.ac.uk
Work: London School of Economics

Tip

Internet is a great resource. Use it. Some resources I like:

Beware, the Crime!

It is illegal to download articles and books from pages like LibGen, Sci-hub or from Telegram bots like @scihubot. Also, DO NOT use VPN to protect your freedom of education (Opera offers a free VPN).

🙃

Course content (Official)

This course is about reinforcement learning, covering the fundamental concepts of reinforcement learning framework and solution methods. The focus is on the underlying methodology as well as practical implementation and evaluation using software code. The course will cover the following topics:

Introduction: course overview.
Foundations of reinforcement learning: Markov decision process, Bellman optimality equation, the existence of optimal stationary policy
Dynamic programing and Monte Carlo methods: policy evaluation, policy improvement, policy iteration, value iteration based on dynamic programming, and Monte Carlo methods for reinforcement learning, including Monte Carlo estimation and Monte Carlo control.
Temporal difference learning: temporal difference learning, temporal difference prediction, Sarsa, Q-learning and n-step temporal difference predictions, TD(lambda).
On-policy prediction and control with approximation: types of function approximators (value and action-value function approximator), gradient based methods for value function prediction, convergence guarantees with linear function approximator, and semi-gradient n-step Sarsa.
Q-learning type algorithms with function approximation: q-learning with linear function approximator, fitted q-iteration, deep q-network, double deep q-learning, convergence analysis.
Policy gradient methods: policy approximation, REINFORCE, actor-critic methods that combine policy function approximation with action-value function approximation.
Trust-region policy optimization: monotonic improvement guarantee, trust-region policy optimization.
Batch off-policy evaluation: importance sampling-based method, doubly robust method, marginalized importance sampling, double reinforcement learning.
Batch policy optimisation: recent advances in offline reinforcement learning algorithms.

Material and solutions

The material followed very closely the content of the famous book by Sutton and Barto.