Be the first to try a brand new demo of DeepPavlov Library!
Advanced Topics in Deep Reinforcement learning
Plan our classes
The idea of this course is to concentrate on modern research in the RL and to analyze significant articles over the past few years.
Rl#1: 13.02.2020
Exploration in RL
Sergey Ivanov
  • Random Network Distillation [1]
  • Intrinsic Curiosity Module [2,3]
  • Episodic Curiosity through Reachability [4]
Rl#2: 20.02.2020
Imitation and Inverse RL
Just Heuristic
  • Imitation Learning[5]
  • Inverse RL [6,7]
  • Learning from Human Preferences [8]
Rl#3: 27.02.2020
Hierarchical Reinforcement Learning
Petr Kuderov
  • A framework for temporal abstraction in RL [9]
  • The Option-Critic Architecture [10]
  • FeUdal Networks for Hierarchical RL [11]
  • Data-Efficient Hierarchical RL [12]
  • Meta Learning Shared Hierarchies [13]
Rl#4: 5.03.2020
Evolutionary Strategies in RL
Evgenia Elistratova
  • A framework for temporal abstraction in reinforcement learning [14]
  • Improving Exploration in Evolution Strategies for Deep RL [15]
  • Paired Open-Ended Trailblazer (POET) [16]
  • Sim-to-Real: Learning Agile Locomotion For Quadruped Robots [17]
Rl#5: 12.03.2020
Distributional Reinforcement Learning
Pavel Shvechikov
  • A Distributional Perspective on RL [18]
  • Distributional RL with Quantile Regression [19]
  • Implicit Quantile Networks for Distributional RL [20]
  • Fully Parameterized Quantile Function for Distributional RL [21]
Rl#6: 19.03.2020
RL for Combinatorial optimization
Taras Khakhulin
  • RL for Solving the Vehicle Routing Problem [22]
  • Attention, Learn to Solve Routing Problems! [23]
  • Learning Improvement Heuristics for Solving the Travelling Salesman Problem [24]
  • Learning Combinatorial Optimization Algorithms over Graphs [25]
Rl#7: 26.03.2020
RL as Probabilistic Inference
Pavel Termichev
  • RL and Control as Probabilistic Inference: Tutorial and Review [26]
  • RL with Deep Energy-Based Policies [27]
  • Soft Actor-Critic [28]
  • Variational Bayesian RL with Regret Bounds [29]
Rl#8: 9.04.2020
Multi Agent Reinforcement Learning
Sergey Sviridov
  • Stabilising Experience Replay for Deep Multi-Agent RL [30]
  • Counterfactual Multi-Agent Policy Gradients [31]
  • Value-Decomposition Networks For Cooperative Multi-Agent Learning [32]
  • Monotonic Value Function Factorisation for Deep Multi-Agent RL [33]
  • Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments [34]
Rl#9: 16.04.2020
Model-Based Reinforcement Learning
Evgeny Kashin
  • DL for Real-Time Atari Game Play Using Offline MCTS Planning [35]
  • Mastering Chess and Shogi by Self-Play with a General RL Algorithm [36]
  • World Models [37]
  • Model-Based RL for Atari [38]
  • Learning Latent Dynamics for Planning from Pixels [39]
Rl#10: 23.04.2020
Reinforcement Learning at Scale
Aleksandr Panin
  • Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour [40]
  • HOGWILD!: A Lock-Free Approach to Parallelizing SGD [41]
  • GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism [42]
  • Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism [43]
  • Learning@home: Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts [44]
Rl#11: 30.04.2020
Multitask & Transfer RL
Dmitry Nikulin
  • Universal Value Function Approximators [45]
  • Hindsight Experience Replay [46]
  • PathNet: Evolution Channels Gradient Descent in Super Neural Networks [47]
  • Progressive Neural Networks [48]
  • Learning an Embedding Space for Transferable Robot Skills [49]
Rl#12: 07.05.2020
Memory in Reinforcement Learning
Artyom Sorokin
  • Recurrent Experience Replay in Distributed RL [50]
  • AMRL: Aggregated Memory For RL [51]
  • Unsupervised Predictive Memory in a Goal-Directed Agent [52]
  • Stabilizing Transformers for RL [53]
  • Model-Free Episodic Control [54]
  • Neural Episodic Control [55]
Rl#13: 14.05.2020
Distributed RL In the wild
Sergey Kolesnikov
  • Asynchronous Methods for Deep RL [56]
  • IMPALA: Scalable Distributed DRL with Importance Weighted Actor-Learner Architectures [57]
  • Distributed Prioritized Experience Replay [58]
  • Making Efficient Use of Demonstrations to Solve Hard Exploration Problems [59]
  • SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference [60]
This works were done as a final project for a course.
Comparative study of Intristic Motivations
Filimonov Vladislav, Tsypin Artem, Shamshiev Mamat
The main idea of this project is compared the most popular intrinsic motivations on the tasks MountainCar-v0.
Ephemeral Value Adjustment
Anna Mazur, Nikita Trukhanov
The main idea of this project is an implementation of Ephemeral Value Adjustment (EVA) from "Fast deep reinforcement learning using online adjustments from the past" by S.Hansen et al.
Comparative Study of Intrinsic Motivations
Burkina Maria
The main idea of this project is compared the most popular intrinsic motivations on the tasks MountainCar-v0 and MountainCarContinuous-v0.
Reinforcement learning for Recommendation Systems
Grishanov Alexey
Wormax-bot - Creating Model-Based RL Algorithm for Multiplayer Video Game with Restricted Frames Available due to Online Nature of the Game
Murashov Leonid
The main challenge of this project is playing online video game but given little number of frames compared to existing solutions for Atari benchmark. And also online format poses challenge of playing versus humans.
Dynamic Attention Model for Vehicle Routing Problems
Eremeev Dmitry, Pustynnikov Alexey
The main idea of this project is implementation of "A Deep Reinforcement Learning Algorithm Using Dynamic Attention Model for Vehicle Routing Problems" article (TensorFlow2).