CSAIL Event Calendar: Previous Series

R-max: A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning

Speaker: Moshe Tennenholtz , Technion and Stanford University
Date: October 4 2001
Time: 4:00PM
Location: NE43-941

In this talk we will present R-max. R-max is an extremely simple model-based reinforcement learning algorithm which can attain near-optimal average reward in polynomial time. In R-max, the agent always maintains a complete, but possibly inaccurate model of its environment and acts based on the optimal policy derived from this model. The model is initialized in an optimistic fashion: all actions in all states return the maximal possible reward (hence the name). During execution, it is updated based on the agent's observations. R-max improves upon several previous algorithms: (1) It is simpler and more general than Kearns and Singh's $E^3$ algorithm, covering many classes of stochastic games. (2) It has a built-in mechanism for resolving the exploration vs. exploitation dilemma. (3) It formally justifies the ``optimism under uncertainty'' bias used in many RL algorithms. (4) It is simpler, more general, and more efficient than Brafman and Tennenholtz's LSG algorithm for learning in single controller stochastic games. (5) It generalizes the algorithm by Monderer and Tennenholtz for learning in repeated games. (6) It is the most efficient algorithm for learning in repeated games, to date, considerably improving and simplifying previous algorithms by Banos and by Megiddo.
The work on R-max is a joint work with Ronen Brafman. If time permits, we will also briefly discuss other parts of my work on exploration vs. exploitation in multi-agent environments.

See other events that are part of AI Colloquium Series Fall 2001

See other events happening in October 2001


About Us Research News Resources Directory