On-line Trading of Exploration and Exploitation
Nips 2006 Workshop
December 8th, Whistler, BC, Canada
Trading exploration and exploitation plays a key role in a number of learning tasks. For example the bandit problem (,,,) provides perhaps the simplest case in which we must decide a trade-off between pulling the arm that appears most advantageous and experimenting with arms for which we do not have accurate information. Similar issues arise in learning problems where the information received depends on the choices made by the learner. Examples include reinforcement learning and active learning, though similar issues also arise in other disciplines, for example sequential decision-making from statistics, optimal control from control theory, etc.
Learning studies have frequently concentrated on the final performance of the learned system rather than consider the errors made during the learning process. For example reinforcement learning has traditionally been concerned with showing convergence to an optimal policy, while in contrast analysis of the bandit problem has attempted to bound the extra loss experienced during the learning process when compared with an a priori optimal agent.
The workshop will provide a focus for work concerned with on-line trading of exploration and exploitation, in particular providing a forum for extensions to the bandit problem, invited presentations by researchers working in related areas in other disciplines, as well as discussion and contributed papers.
Papers: The organizing committee would like to invite extended abstract paper submissions (1 - 4 pages + appendix if needed) to the NIPS 2006 workshop on ‘On-line Trading of Exploration and Exploitation’ in the following related areas (but not restricted to):
The organizers will select a small number of contributed papers for oral presentation to last between 20 - 40 minutes. All other accepted papers will be displayed in a poster session. We shall evaluate proposed talks on the merits of both their relevance and significance and their contribution to the well-roundedness of the workshop as a whole. The organizers will also edit a collected volume of papers taken from the workshop.
deadline: 28th October 2006
Papers should be written using the NIPS style file (nips06.sty), not exceed 8 pages in total length (typically between 1 - 4 pages + 4 pages of an appendix if needed), and be sent as PDF files to Zakria Hussain by 23.59 (Western Samoa time) on the 28th October 2006.
Challenge: A call for participation in phase 2 of the PASCAL network of excellence Exploration Vs Exploitation (EE) challenge (see http://www.pascal-network.org/Challenges/EEC/). The workshop will be used to discuss the results of the challenge currently running under the auspices of the PASCAL network and Touch Clarity Ltd, investigating algorithms for multi-armed bandits in which the response rates for the individual arms vary over time according to certain patterns. Touch Clarity have agreed to award a first prize of £1000 to the best entry. See challenge website for instructions, data set downloads and submission instructions.
15th November 2006
We plan a 1-day workshop to be held on the 8th December 2006.
The workshop will include a discussion of the results of the PASCAL challenge concerned with extensions of the bandit problem as well as broadening the discussion to other tasks and other approaches to on-line trading of exploration and exploitation.
Our aim is to make connections to approaches (both algorithmic and theoretical) that have been developed in other areas. We plan to fund invited speakers from other disciplines, specifically a presentation on Optimal Control and on Sequential Decision-making.
One part of the workshop will be used to discuss the results of the challenge (see above) currently running under the auspices of the PASCAL network investigating algorithms for multi-armed bandits in which the response rates for the individual arms vary over time according to certain patterns (see http://www.pascal-network.org/Challenges/EEC/).
Morning session 7:30am - 10:30am
Afternoon session 3:30pm - 6:30pm
 Peter Auer, Nicolò Cesa-Bianchi, Paul Fischer. Finite-time Analysis of the Multi-armed Bandit Problem. Machine Learning (2002).
 Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, Robert E. Schapire. Gambling in a Rigged Casino: The adversarial multi-armed bandit problem. Proceedings of the 36th Annual Symposium on Foundations of Computer Science (1998).
Shie Mannor, John N. Tsitsiklis. The Sample Complexity of Exploration in the Multi-Armed Bandit Problem. The Journal of Machine Learning Research (2004).
Vermorel and Mehryar Mohri. Multi-Armed
Bandit Algorithms and Empirical Evaluation. In Proceedings
of the 16th European Conference on Machine Learning (ECML 2005).