Mohammad Ghavamzadeh

Senior Researcher

Learning Safe Policies in Sequential Decision-Making

  Halls department, Hall 3
  Wednesday, 27 December 2017
  14:30 - 15:30


In many practical problems from online advertisement to health informatics and computational finance, it is often important to be able to guarantee that the policy/strategy generated by our algorithm performs at least as well as a baseline. This reduces the risk of deploying our policy and helps us to convince the product (hospital, investment) manager that it is not going to harm the business.

In this talk, we first discuss four different approaches to this fundamental problem that we have studied in the last four years, we call them model-based, model-free, online, and risk-sensitive. We then focus on model-based and online approaches that are related to the important problems of robust learning and control, and safe exploration. In the model-based approach, we first use the batch of data and build a simulator that mimics the behavior of the dynamical system under studies (online advertisement, hospital’s ER, financial market), and then use this simulator to generate data and learn a policy. The main challenge here is to have guarantees on the performance of the learned policy, given the error in the simulator. In the online approach, the goal is to control the exploration of the algorithm in a way that never during its execution the loss of using it instead of the baseline strategy is more than a given margin. We present algorithms based on these approaches and demonstrate their usefulness in real-world applications such as personalized ad recommendation, energy arbitrage, traffic signal control, and American option pricing.


Mohammad Ghavamzadeh received a Ph.D. degree in Computer Science from the University of Massachusetts Amherst in 2005. From 2005 to 2008, he was a postdoctoral fellow at the University of Alberta. He has been a permanent researcher at INRIA in France since November 2008. He was promoted to a first-class researcher in 2010, was the recipient of the "INRIA award for scientific excellence" in 2011, and obtained his Habilitation in 2014. He was then a senior analytics research at Adobe from October 2013 to May 2017. Since June 2017, he has been a senior researcher at Google DeepMind. Mohammad has been an area chair, a senior program committee member, and a program committee member at NIPS, ICML, IJCAI, AAAI, UAI, COLT, and AISTATS. He has been on the editorial board of Machine Learning Journal (MLJ). He has published over 70 refereed papers in major machine learning, AI, and control journals and conferences, and has organized several tutorials and workshops at NIPS, ICML, and AAAI. His research is in the areas of machine learning, artificial intelligence, control, and learning theory; particularly to investigate the principles of scalable decision-making and to devise, analyze, and implement algorithms for sequential decision-making under uncertainty and reinforcement learning.