RL course

Be the first to try a brand new demo of DeepPavlov Library!

COURSE:
Advanced Topics in Deep Reinforcement learning

Course channel: https://t.me/theoreticalrl
Course discussion group: https://t.me/adv_topics_in_rl_ru_2020

Plan our classes

The idea of this course is to concentrate on modern research in the RL and to analyze significant articles over the past few years.

Rl#1: 13.02.2020
Exploration in RL

Sergey Ivanov

Random Network Distillation [1]
Intrinsic Curiosity Module [2,3]
Episodic Curiosity through Reachability [4]

Video
Presentation

Rl#2: 20.02.2020
Imitation and Inverse RL

Just Heuristic

Imitation Learning[5]
Inverse RL [6,7]
Learning from Human Preferences [8]

Video
Presentation

Rl#3: 27.02.2020
Hierarchical Reinforcement Learning

Petr Kuderov

A framework for temporal abstraction in RL [9]
The Option-Critic Architecture [10]
FeUdal Networks for Hierarchical RL [11]
Data-Efficient Hierarchical RL [12]
Meta Learning Shared Hierarchies [13]

Video
Presentation

Rl#4: 5.03.2020
Evolutionary Strategies in RL

Evgenia Elistratova

A framework for temporal abstraction in reinforcement learning [14]
Improving Exploration in Evolution Strategies for Deep RL [15]
Paired Open-Ended Trailblazer (POET) [16]
Sim-to-Real: Learning Agile Locomotion For Quadruped Robots [17]

Video
Presentation

Rl#5: 12.03.2020
Distributional Reinforcement Learning

Pavel Shvechikov

A Distributional Perspective on RL [18]
Distributional RL with Quantile Regression [19]
Implicit Quantile Networks for Distributional RL [20]
Fully Parameterized Quantile Function for Distributional RL [21]

Video
Presentation

Rl#6: 19.03.2020
RL for Combinatorial optimization

Taras Khakhulin

RL for Solving the Vehicle Routing Problem [22]
Attention, Learn to Solve Routing Problems! [23]
Learning Improvement Heuristics for Solving the Travelling Salesman Problem [24]
Learning Combinatorial Optimization Algorithms over Graphs [25]

Video
Presentation

Rl#7: 26.03.2020
RL as Probabilistic Inference

Pavel Termichev

RL and Control as Probabilistic Inference: Tutorial and Review [26]
RL with Deep Energy-Based Policies [27]
Soft Actor-Critic [28]
Variational Bayesian RL with Regret Bounds [29]

Video
Presentation

Rl#8: 9.04.2020
Multi Agent Reinforcement Learning

Sergey Sviridov

Stabilising Experience Replay for Deep Multi-Agent RL [30]
Counterfactual Multi-Agent Policy Gradients [31]
Value-Decomposition Networks For Cooperative Multi-Agent Learning [32]
Monotonic Value Function Factorisation for Deep Multi-Agent RL [33]
Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments [34]

Video
Presentation

Rl#9: 16.04.2020
Model-Based Reinforcement Learning

Evgeny Kashin

DL for Real-Time Atari Game Play Using Offline MCTS Planning [35]
Mastering Chess and Shogi by Self-Play with a General RL Algorithm [36]
World Models [37]
Model-Based RL for Atari [38]
Learning Latent Dynamics for Planning from Pixels [39]

Video
Presentation

Rl#10: 23.04.2020
Reinforcement Learning at Scale

Aleksandr Panin

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour [40]
HOGWILD!: A Lock-Free Approach to Parallelizing SGD [41]
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism [42]
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism [43]
Learning@home: Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts [44]

Video
Presentation

Rl#11: 30.04.2020
Multitask & Transfer RL

Dmitry Nikulin

Universal Value Function Approximators [45]
Hindsight Experience Replay [46]
PathNet: Evolution Channels Gradient Descent in Super Neural Networks [47]
Progressive Neural Networks [48]
Learning an Embedding Space for Transferable Robot Skills [49]

Video
Presentation

Rl#12: 07.05.2020
Memory in Reinforcement Learning

Artyom Sorokin

Recurrent Experience Replay in Distributed RL [50]
AMRL: Aggregated Memory For RL [51]
Unsupervised Predictive Memory in a Goal-Directed Agent [52]
Stabilizing Transformers for RL [53]
Model-Free Episodic Control [54]
Neural Episodic Control [55]

Video
Presentation

Rl#13: 14.05.2020
Distributed RL In the wild

Sergey Kolesnikov

Asynchronous Methods for Deep RL [56]
IMPALA: Scalable Distributed DRL with Importance Weighted Actor-Learner Architectures [57]
Distributed Prioritized Experience Replay [58]
Making Efficient Use of Demonstrations to Solve Hard Exploration Problems [59]
SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference [60]

Video
Presentation

Projects

This works were done as a final project for a course.

GitHub project

Comparative study of Intristic Motivations

Filimonov Vladislav, Tsypin Artem, Shamshiev Mamat

The main idea of this project is compared the most popular intrinsic motivations on the tasks MountainCar-v0.

GitHub project

Ephemeral Value Adjustment

Anna Mazur, Nikita Trukhanov

The main idea of this project is an implementation of Ephemeral Value Adjustment (EVA) from "Fast deep reinforcement learning using online adjustments from the past" by S.Hansen et al.

GitHub project

Comparative Study of Intrinsic Motivations

Burkina Maria

The main idea of this project is compared the most popular intrinsic motivations on the tasks MountainCar-v0 and MountainCarContinuous-v0.

GitHub project

Reinforcement learning for Recommendation Systems

Grishanov Alexey

The main idea of this project is implementation of Deep Reinforcement Learning based Recommendation with Explicit User-Item Interactions Modeling.

GitHub project

Wormax-bot - Creating Model-Based RL Algorithm for Multiplayer Video Game with Restricted Frames Available due to Online Nature of the Game

Murashov Leonid

The main challenge of this project is playing online video game but given little number of frames compared to existing solutions for Atari benchmark. And also online format poses challenge of playing versus humans.

GitHub project

Dynamic Attention Model for Vehicle Routing Problems

Eremeev Dmitry, Pustynnikov Alexey

The main idea of this project is implementation of "A Deep Reinforcement Learning Algorithm Using Dynamic Attention Model for Vehicle Routing Problems" article (TensorFlow2).

SUPPORTS