AG DANK/BCS Meeting 2013 in London

Participation and location
Datasets for analysis by participants
New! presentation slides are online! Programme and presentations
Accomodation

Department of Statistical Science , University College London, 8/9 November 2013, Galton Lecture Theatre. The meeting will probably start around 1:30pm on Friday and end at about 1:30 pm on Saturday.

Focus topic: variable selection and dimension reduction in clustering and classification.

Local organisation: Christian Hennig. (c.hennig (at) ucl.ac.uk).

Information on the societies:
British Classification Society,
AG DANK - working group on data analysis and numerical classification of the GfKL (German Classification Society).

The meeting is hosted and funded by the UCL Centre for Computational Statistics and Machine Learning
and supported by Chapman and Hall/CRC.

Invited speakers:

Participation

Participation is still possible, as long as the capacity of the lecture theatre is not exhausted. If you want to participate, please write to the local organiser at c.hennig (at) ucl.ac.uk.
Further presentations can unfortunately no longer be accepted.
There will be no fee for participation.

Location

The meeting will take place in the Galton Lecture Theatre, Room 115, 1-19 Torrington Place. Here is a map. More information is here. The closest Underground station is Goodge Street on the Northern Line; the stations Warren Street, Euston and Euston Square are 10-15 minutes away on foot; King's Cross/St. Pancras (Eurostar terminal and connection to Luton Airport) a bit more than 20 minutes.

Datasets for analysis by participants

Some time will be reserved for participants to present analyses of the following data sets at the meeting. Please restrict your presentations of analyses to 5 minutes at the very most.

Spike sorting

The dataset was made available by Kenneth Harris, UCL Neuroscience (presentation is here) . It comprises 20000 observations on 96 variables and an unknown number of clusters; only some features are expected to be informative for each cluster, but different feature combinations will be relevant for different clusters.
Dataset (ASCII text; it is recommended to save the link)
Informations about the dataset (ASCII text)
Illustration for the informations about the dataset (pdf; see informations-file for an explanation)

Competition

In the dataset there is a known (artificial) true cluster (along with potentially several other real clusters). You can take part in a competition by sending, by
Tuesday 5 November 18:00, an email to
c.hennig (at) ucl.ac.uk with an ASCII text file with 20000 cluster memberships. See the information file for details. For the book prizes you can win, see below.

Bat species

The dataset was made available by Veronica Zamora-Gutierrez, Cambridge University (presentation is here). It comprises 2678 observations on 73 variables. There are eight known classes (species of bats), so this can be interpreted as a supervised classification problem, with focus on which variables discriminate the species in the best possible way. However, it is also of interest to find a clustering of the eight species into fewer clusters, which could be used as a first step for better classification.
Dataset (ASCII text; it is recommended to save the link)
Informations about the dataset (docx file)

Book prizes

You can win the following book prizes donated by Chapman and Hall/CRC:

Clustering - A Data Recovery Approach by Boris Mirkin.
Data Clustering - Algorithms and Applications by Charu C. Aggarwal; Chandan K. Reddy.
Data Clustering in C++ - An Object Oriented Approach by Guojun Gan.
Ensemble Methods - Foundations and Algorithms by Zhi-Hua Zhou.

The two winners of the Spike Sorting competition can pick their books first, the other two prize winners are drawn at random from those who present an analysis of the Bat Species dataset at the meeting.

Programme

Friday 8 November

13:30 Welcome
13:45 Gilles Celeux - Variable selection in clustering and classification": issues, difficulties and solutions (presentation is here)
14:30 Silvia Liverani and Michail Papathomas - Using Profile Regression Mixture Models and Dirichlet Processes to explore the combined effect of risk factors; the R package PReMiuM (presentation is here)
15:00 Francesca Greselin - Data driven constraints for Gaussian mixtures of factor analyzers: an application to market segmentation (presentation is here)
15:30 Break
15:45 Silvia Pandolfi - Item selection by latent class-based methods: an application to nursing homes evaluation (presentation is here)
16:15 Hans-Joachim Mucha - Variable Selection in Cluster Analysis Using Resampling Techniques (presentation is here)
16:45 James Barrett - Dimensionality detection and integration of multiple sources via the Gaussian Process Latent Variable Model (presentation is here)
17:15 Break
17:40 Veronica Zamora-Gutierrez, Kenneth Harris and others - Discussion of datasets (presentations above).
19:00 AGM of the BCS (BCS members only)
20:00 Dinner. Some tables are booked in the Indian restaurant Lal Qila, 117 Tottenham Court Road (5-10 minutes from the department; everybody pays on their own).

Saturday 9 November

9:00 Andre T. Martins and Mario Figueiredo - Sparsity and Structured Sparsity for Feature Selection in Machine Learning (presentation is here)
9:45 Ulrich Müller-Funk - Non-linear factor selection and copulas of copulas (presentation is here)
10:15 Gunter Ritter - A probabilistic method for gene expression data (presentation is here)
10:45 Break
11:15 Yoshikazu Terada - Achieving near-perfect clustering for high dimension, low sample size data (presentation is here)
11:35 Thomas Weber - Multidimensional questions: Can multivariate statistics help us to classify Older Stone Age artefact inventories? (presentation is here)
11:55 Nema Dean - Variable selection in educational testing clustering (presentation is here)
12:15 Andreas Artemiou - Sufficient dimension reduction using machine learning (presentation is here)
12:35 Roberto Rocci - Models for simultaneous clustering and reduction of three-way data (presentation is here)
13:20 End of meeting

Accomodation

There are many hotels and bed and breakfasts around UCL (as search terms you could use Bloomsbury, Russell Square or Euston).
A rather good value one is the Crescent Hotel.
More possibilities are some Grange Hotels, e.g., Lancaster Hotel or Langham Court.

More information will be added later.