On-line Trading of Exploration and Exploitation

Nips 2006 Workshop

December 8th, Whistler, BC, Canada

 

Background

 

Background:

Trading exploration and exploitation plays a key role in a number of learning tasks. For example the bandit problem ([1],[2],[3],[4]) provides perhaps the simplest case in which we must decide a trade-off between pulling the arm that appears most advantageous and experimenting with arms for which we do not have accurate information. Similar issues arise in learning problems where the information received depends on the choices made by the learner. Examples include reinforcement learning and active learning, though similar issues also arise in other disciplines, for example sequential decision-making from statistics, optimal control from control theory, etc.

Learning studies have frequently concentrated on the final performance of the learned system rather than consider the errors made during the learning process. For example reinforcement learning has traditionally been concerned with showing convergence to an optimal policy, while in contrast analysis of the bandit problem has attempted to bound the extra loss experienced during the learning process when compared with an a priori optimal agent.

The workshop will provide a focus for work concerned with on-line trading of exploration and exploitation, in particular providing a forum for extensions to the bandit problem, invited presentations by researchers working in related areas in other disciplines, as well as discussion and contributed papers.


Call for Participation:

Papers: The organizing committee would like to invite extended abstract paper submissions (1 - 4 pages + appendix if needed) to the NIPS 2006 workshop on ‘On-line Trading of Exploration and Exploitation’ in the following related areas (but not restricted to):

  • Exploration and Exploitation problems
  • Multi-armed bandit problems
  • Sequential decision-making
  • Empirical/theoretical studies of bandit problems
  • On-line learning algorithms
  • Related work from other disciplines such as control theory, game theory, statistics etc.

The organizers will select a small number of contributed papers for oral presentation to last between 20 - 40 minutes. All other accepted papers will be displayed in a poster session. We shall evaluate proposed talks on the merits of both their relevance and significance and their contribution to the well-roundedness of the workshop as a whole. The organizers will also edit a collected volume of papers taken from the workshop.

Paper submission deadline: 28th October 2006
Acceptance notification: 10th November 2006
Workshop: 8th December 2006

Papers should be written using the NIPS style file (nips06.sty), not exceed 8 pages in total length (typically between 1 - 4 pages + 4 pages of an appendix if needed), and be sent as PDF files to Zakria Hussain by 23.59 (Western Samoa time) on the 28th October 2006.

Challenge: A call for participation in phase 2 of the PASCAL network of excellence Exploration Vs Exploitation (EE) challenge (see http://www.pascal-network.org/Challenges/EEC/). The workshop will be used to discuss the results of the challenge currently running under the auspices of the PASCAL network and Touch Clarity Ltd, investigating algorithms for multi-armed bandits in which the response rates for the individual arms vary over time according to certain patterns. Touch Clarity have agreed to award a first prize of £1000 to the best entry. See challenge website for instructions, data set downloads and submission instructions.

Code submission: 15th November 2006
Seed distribution: 16th November 2006
Final results: 23rd November 2006

Structure of the workshop:

We plan a 1-day workshop to be held on the 8th December 2006.

The workshop will include a discussion of the results of the PASCAL challenge concerned with extensions of the bandit problem as well as broadening the discussion to other tasks and other approaches to on-line trading of exploration and exploitation.

Our aim is to make connections to approaches (both algorithmic and theoretical) that have been developed in other areas. We plan to fund invited speakers from other disciplines, specifically a presentation on Optimal Control and on Sequential Decision-making.

One part of the workshop will be used to discuss the results of the challenge (see above) currently running under the auspices of the PASCAL network investigating algorithms for multi-armed bandits in which the response rates for the individual arms vary over time according to certain patterns (see http://www.pascal-network.org/Challenges/EEC/).


Schedule:

Morning session 7:30am - 10:30am

7:30am Welcome, Organizers (slides)
  Invited talk
7:35am

On Efficient Sequential Decision Making in Structured Problems (talk slides)

  Adam Kalai
   
  Extended abstracts
8:20am

Finite horizon exploration for path integral control problems (abstract) (talk slides)

  B. Kappen
8:50am Coffee break
9:05am Use of variance estimation in the multi-armed bandit problem (abstract) (talk slides)
  J.Y. Audibert, R. Munos, C. Szepesvári
9:30am Regret to the Best vs. Regret to the Average (abstract) (talk slides)
  E. Even-Dar, M. Kearns, Y. Mansour, J. Wortman
9:50am Effects of Stress and Genotype on Exploration-Exploitation Dynamics in Reinforcement Learning (abstract) (talk slides)
  G. Lukšys, J. Knüsel, D. Sheynikhovich, C. Sandi, W. Gerstner
10:10am Multi-Armed Bandit, Dynamic Environments and Meta-Bandits (talk slides)
  C. Hartland, S. Gelly, N. Baskiotis, O. Teytaud and M. Sebag

Afternoon session 3:30pm - 6:30pm

  Invited talk
3:30pm Using upper confidence bounds to control exploration and exploitation (talk slides)
  Csaba Szepesvári
4.15pm Exploration exploitation in Go: UCT for Monte-Carlo Go (talk slides)
  Sylvain Gelly and Yizao Wang
   
  Challenge results
4:35pm PASCAL Exploration Vs Exploitation challenge results for phase II
  John Shawe-Taylor and Zakria Hussain (talk slides)
4:45pm Challenge entrant 2 - Peter Auer and Thomas Jaksch
4:55pm Coffee break
   
5:15pm Poster Session
  includes all extended abstract talks from above plus the following:
  - Exploration exploitation in Go: UCT for Monte-Carlo Go
  - EXIST: Exploitation/Exploration Inference for Statistical Software Testing
  - Handling the Exploration-Exploitation Trade-off through Value and Variance Estimation
  - Gambling in a Computationally Expensive Casino: Algorithm Selection as a Bandit Problem
6:00pm Discussion and wrap up
  Program Committee and invited speakers

 

Program Committee:

Peter Auer University of Leoben
Nicolò Cesa-Bianchi University of Milan
Zakria Hussain University College London
Adam Kalai Toyota Technology Institute
Robert Kleinberg UC Berkeley and Cornell University
Yishay Mansour Tel Aviv University
Leonard Newnham Touch Clarity Ltd
John Shawe-Taylor University College London

Organization:

Peter Auer University of Leoben
Nicolò Cesa-Bianchi University of Milan
Zakria Hussain University College London
Leonard Newnham Touch Clarity Ltd
John Shawe-Taylor University College London


References:

[1] Peter Auer, Nicolò Cesa-Bianchi, Paul Fischer. Finite-time Analysis of the Multi-armed Bandit Problem. Machine Learning (2002).

[2] Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, Robert E. Schapire. Gambling in a Rigged Casino: The adversarial multi-armed bandit problem. Proceedings of the 36th Annual Symposium on Foundations of Computer Science (1998).

[3]Shie Mannor, John N. Tsitsiklis. The Sample Complexity of Exploration in the Multi-Armed Bandit Problem. The Journal of Machine Learning Research (2004).

[4] Joannès Vermorel and Mehryar Mohri. Multi-Armed Bandit Algorithms and Empirical Evaluation. In Proceedings of the 16th European Conference on Machine Learning (ECML 2005).

Sponsors:

 

 
 

Background | Call for Participation | Structure of the workshop

Schedule | Program Committee | Organization | References | Sponsors