PhD project A new method for estimating the number of clusters The task of cluster analysis is to find groupings in data. This has many applications, for example in pattern recognition, biological species identification, psychology and social stratification. Most classical cluster analysis methods assume the number of clusters to be known, which in reality almost never is the case. There are several approaches to estimate the number of clusters. However, most of them are connected to specific cluster analysis methods, and for many of them strong evidence about their quality doesn't exist. This project is about defining a new method based on distances of observations to the closest cluster centre (most existing methods are based on squared distances, which is theoretically nice but inflexible and often not very robust) and comparing it systematically to existing approaches. The method can then be used together with k-medoids clustering (Kaufman and Rousseeuw, 1990). Actually, Kaufman and Rousseeuw, 1990, suggest an alternative method, the "average silhouette width", which lacks a theoretical basis and is not robust against outliers (Hennig, 2008). While methods based on squared distances can be motivated by their connection with Maximum Likelihood-estimators for the normal distribution, other distributions need to be considered for k-medoids. C. Hennig: Dissolution point and isolation robustness: robustness criteria for general cluster analysis methods. Journal of Multivariate Analysis 99 (2008), 1154-1176. L. Kaufman and P. J. Rousseeuw: Finding Groups in Data. Wiley (1990).