AG DANK/BCS Meeting 2013 in London
Department of Statistical Science , University College London, 8/9 November 2013, Galton Lecture Theatre. The meeting will probably start around 1:30pm on Friday and end at about 1:30 pm on Saturday.
Focus topic: variable selection and dimension reduction in clustering and
classification.
Local organisation: Christian Hennig. (c.hennig (at) ucl.ac.uk).
Information on the societies:
British
Classification Society,
AG DANK - working group on data analysis and numerical classification of the
GfKL (German Classification Society).
The meeting is hosted and funded by the
UCL Centre for Computational Statistics and Machine Learning
and supported by
Chapman and Hall/CRC.
Invited speakers:
Participation
Participation is still possible, as long as the capacity of the lecture theatre
is not exhausted. If you want to participate, please
write to the local organiser at c.hennig (at) ucl.ac.uk.
Further presentations can unfortunately no longer be accepted.
There will be no fee for participation.
Location
The meeting will take place in the Galton Lecture Theatre, Room 115,
1-19 Torrington Place. Here is a map. More information is
here.
The closest Underground station is Goodge Street on the Northern
Line; the stations Warren Street, Euston and Euston Square are
10-15 minutes away on foot;
King's Cross/St. Pancras (Eurostar terminal and connection to Luton Airport)
a bit more than 20 minutes.
Datasets for analysis by participants
Some time will be reserved for participants to present analyses of the following
data sets at the meeting. Please restrict your presentations of analyses
to 5 minutes at the very most.
Spike sorting
The dataset was made available by Kenneth Harris, UCL
Neuroscience (presentation is here) . It comprises 20000 observations on 96 variables and an unknown
number of clusters; only some features are expected to be informative for each cluster, but different feature combinations will be relevant for different clusters.
Dataset (ASCII text; it is recommended to save the link)
Informations about the dataset (ASCII text)
Illustration for the informations
about the dataset (pdf; see informations-file for an explanation)
Competition
In the dataset there is a known (artificial) true cluster (along with
potentially several other real clusters). You can take part in a competition
by sending, by
Tuesday 5 November 18:00, an email to
c.hennig (at) ucl.ac.uk with an ASCII text file with 20000
cluster memberships. See the information file for details. For the book prizes you can win, see below.
Bat species
The dataset was made available by Veronica Zamora-Gutierrez, Cambridge
University (presentation is here). It comprises 2678 observations on 73 variables. There are eight
known classes (species of bats), so this can be interpreted as a supervised
classification problem, with focus on which variables discriminate the
species in the best possible way. However, it is also of interest to find
a clustering of the eight species into fewer clusters, which could be used
as a first step for better classification.
Dataset (ASCII text; it is recommended to save the link)
Informations about the dataset (docx file)
Book prizes
You can win the following book prizes donated by
Chapman and Hall/CRC:
The two winners of the Spike Sorting competition can pick their books first,
the other two prize winners are drawn at random from those who present an
analysis of the Bat Species dataset at the meeting.
Programme
Friday 8 November
- 13:30 Welcome
- 13:45 Gilles Celeux -
Variable selection in clustering and classification":
issues, difficulties and solutions (presentation is here)
- 14:30 Silvia Liverani and Michail Papathomas -
Using Profile Regression Mixture Models and Dirichlet Processes to explore
the combined effect of risk factors; the R package PReMiuM (presentation is here)
- 15:00 Francesca Greselin - Data driven constraints for Gaussian mixtures of factor analyzers: an application to market segmentation (presentation is here)
- 15:30 Break
- 15:45 Silvia Pandolfi -
Item selection by latent class-based methods: an application to nursing
homes evaluation (presentation is here)
- 16:15 Hans-Joachim Mucha - Variable Selection in Cluster Analysis Using
Resampling Techniques (presentation is here)
- 16:45 James Barrett - Dimensionality detection and integration of multiple sources via the Gaussian Process Latent Variable Model (presentation is here)
- 17:15 Break
- 17:40 Veronica Zamora-Gutierrez, Kenneth Harris and others - Discussion of datasets (presentations above).
- 19:00 AGM of the BCS (BCS members only)
- 20:00 Dinner. Some tables are booked in the Indian restaurant
Lal Qila, 117 Tottenham Court Road (5-10 minutes from the department; everybody pays on their own).
Saturday 9 November
- 9:00 Andre T. Martins and Mario Figueiredo - Sparsity and Structured Sparsity for Feature Selection in Machine Learning (presentation is here)
- 9:45 Ulrich Müller-Funk -
Non-linear factor selection and copulas of copulas (presentation is here)
- 10:15 Gunter Ritter - A probabilistic method for gene
expression data (presentation is here)
- 10:45 Break
- 11:15 Yoshikazu Terada - Achieving near-perfect clustering for high dimension, low sample size data (presentation is here)
- 11:35 Thomas Weber - Multidimensional questions:
Can multivariate statistics help us to classify Older Stone Age
artefact inventories? (presentation is here)
- 11:55 Nema Dean - Variable selection in educational testing clustering (presentation is here)
- 12:15 Andreas Artemiou - Sufficient dimension reduction using machine learning (presentation is here)
- 12:35 Roberto Rocci - Models for simultaneous clustering and reduction of three-way data (presentation is here)
- 13:20 End of meeting
Accomodation
There are many hotels and bed and breakfasts
around UCL (as search terms you could use Bloomsbury, Russell Square or Euston).
A rather good value one is the Crescent Hotel.
More possibilities are some
Grange Hotels, e.g., Lancaster
Hotel or Langham Court.
More information will be added later.