Research

Inverse Reinforcement Learning in Contextual MDPs
Stav Belogolovsky*, Philip Korsunsky*, Shie Mannor*, Chen Tessler* and Tom Zahavy*
Published: Springer Machine Learning 2021

Real-world sequential decision problems often share two important properties – the reward function is often unknown, yet expert demonstrations can be acquired, and there often exists a static parameter, also known as the context, which determines certain aspects of the problem. In this work we formalize the Contextual Inverse Reinforcement Learning framework, propose several algorithms and analyze them both theoretically and empirically.

 
Maximizing the Total Reward via Reward Tweaking
Chen Tessler and Shie Mannor
Preprint

In many practical applications, we train the agent on the $\gamma$-discounted task and evaluate it on the total reward. The discrepancy between training and evaluation may lead to sub-optimal solutions. Reward tweaking learns an alternative surrogate reward, aimed to guide the agent towards better behavior on the evaluation metric.

 
Distributional Policy Optimization: An Alternative Approach for Continuous Control
Chen Tessler*, Guy Tennenholtz* and Shie Mannor
Published: NeurIPS 2019

We propose a method for learning distributional policies, policies which are not limited to parametric distribution functions (e.g., Gaussian and Delta). This approach overcomes sub-optimal local extremum in continuous control regimes.

 
Action Robust Reinforcement Learning and Applications in Continuous Control
Chen Tessler*, Yonathan Efroni* and Shie Mannor
Published: ICML 2019

Action Robust is a special case of robustness, in which the agent is robust to uncertainty in the performed action. We show (theoretically) that this form of robustness has efficient solutions and (empirically) results in policies which are robust to common uncertainties in robotic domains.

 
Reward Constrained Policy Optimization
Chen Tessler, Daniel J. Mankowitz and Shie Mannor
Published: ICLR 2019

Learning a policy which adheres to behavioral constraints is an important task. Our algorithm, RCPO, enables the satisfaction of not only discounted constraints but also average and probabilistic, in an efficient manner.

 
A Deep Hierarchical Approach to Lifelong Learning in Minecraft
Chen Tessler*, Shahar Givony*, Tom Zahavy*, Daniel J. Mankowitz* and Shie Mannor
Published: AAAI 2017

We propose a lifelong learning system that has the ability to reuse and transfer knowledge from one task to another while efficiently retaining the previously learned knowledge-base. Knowledge is transferred by learning reusable skills to solve tasks in Minecraft, a popular video game which is an unsolved and high-dimensional lifelong learning problem.