Characterization and evaluation of similarity measures for pairs of clusterings

@article{Pfitzner2009CharacterizationAE,
  title={Characterization and evaluation of similarity measures for pairs of clusterings},
  author={Darius Pfitzner and Richard Leibbrandt and David M. W. Powers},
  journal={Knowledge and Information Systems},
  year={2009},
  volume={19},
  pages={361-394},
  url={https://api.semanticscholar.org/CorpusID:6935380}
}
A paradigm apparatus for the evaluation of clustering comparison techniques is introduced and the proposal of a novel clustering similarity measure, the Measure of Concordance, is proposed, showing that only MoC, Powers’s measure, Lopez and Rajski's measure and various forms of Normalised Mutual Information exhibit the desired behaviour under each of the test scenarios.

The Impact of Random Models on Clustering Similarity

It is demonstrated that the choice of random model can have a drastic impact on the ranking of similar clustering pairs, and the evaluation of a clustering method with respect to a random baseline; thus, the choices of random clustering model should be carefully justified.

Element-centric clustering comparison unifies overlaps and hierarchy

This work unify the comparison of disjoint, overlapping, and hierarchically structured clusterings by proposing a new element-centric framework: elements are compared based on the relationships induced by the cluster structure, as opposed to the traditional cluster-centric philosophy.

A Systematic Evaluation of Clustering Algorithms Against Expert-Derived Clustering

This research utilizes six mechatronic products and a team of domain experts to assess how closely the results of six clustering algorithms align with expert-generated outcomes.

Towards a Classification of Binary Similarity Measures

The paper proposes the method of comparative analysis of similarity measures based on the set theoretic representation of these measures and comparison of algebraic properties of these representations and shows existing relationship between results of clustering and the classification of measures by their properties.

Characterizing and Comparing External Measures for the Assessment of Cluster Analysis and Community Detection

This work proposes a new empirical evaluation framework that is not tied to any specific measure or application, so it can be applied to any situation and is illustrated by applying it to a selection of standard measures, and can be put in practice through two concrete use cases.

On Using Class-Labels in Evaluation of Clusterings

This work discusses common β€œdefects” that clustering algorithms exhibit w.r.t. this evaluation, and shows them on several real world data sets of different domains along with a discussion why the detected clusters do not indicate a bad performance of the algorithm but are valid and useful results.

Understanding information theoretic measures for comparing clusterings

It is shown that a class of normalizations of the mutual information can be decomposed into indices that contain information on the level of individual clusters that reveal that overall measures can be interpreted as summary statistics of information reflected in the individual clusters.

A review of conceptual clustering algorithms

This work presents an overview of the most influential algorithms reported in the field of conceptual clustering, highlighting their limitations or drawbacks, and presents a taxonomy of these methods as well as a qualitative comparison of these algorithms.

An Improved Similarity-Based Clustering Algorithm for Multi-Database Mining

This work uses coordinate descent and back-propagation to search for the optimal clustering of the n multiple database in a way that minimizes a convex clustering quality measure L(ΞΈ) in less than (n2βˆ’n)/2 iterations.

A Critical Note on the Evaluation of Clustering Algorithms

It is suggested that the applicability of existing benchmark datasets should be carefully revisited and significant efforts need to be devoted to improving the current practice of experimental evaluation of clustering algorithms to ensure an essential match between algorithms and problems.
...

A Method for Comparing Two Hierarchical Clusterings

The derivation and use of a measure of similarity between two hierarchical clusterings, Bk, is derived from the matching matrix, [mij], formed by cutting the two hierarchical trees and counting the number of matching entries in the k clusters in each tree.

C HAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling

A novel hierarchical clustering algorithm called C HAMELEON that measures the similarity of two clusters based on a dynamic model and can discover natural clusters that many existing state of the art clustering algorithms fail to find.

Chameleon: Hierarchical Clustering Using Dynamic Modeling

Chameleon's key feature is that it accounts for both interconnectivity and closeness in identifying the most similar pair of clusters, which is important for dealing with highly variable clusters.

On Clustering Validation Techniques

The fundamental concepts of clustering are introduced while it surveys the widely known clustering algorithms in a comparative way and the issues that are under-addressed by the recent algorithms are illustrated.

Comparing Clusterings by the Variation of Information

This paper proposes an information theoretic criterion for comparing two partitions, or clusterings, of the same data set, called variation of information (VI), which is positive, symmetric and obeys the triangle inequality.

Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions

This paper introduces the problem of combining multiple partitionings of a set of objects into a single consolidated clustering without accessing the features or algorithms that determined these partitionings and proposes three techniques for obtaining high-quality combiners (consensus functions).

Robust data clustering

It is shown that the proposed technique attempts to optimize the mutual information based criteria, although the optimality is not ensured in all situations, and experimental results show the ability of the technique to identify clusters with arbitrary shapes and sizes.

Asymmetric binary similarity measures

A new coefficient, β€œC”, is introduced which overcomes problems and leads to homogeneous classifications in the sense described above and further general recommendations are made for the use of these coefficients in various contexts.

A General Approach to Clustering in Large Databases with Noise

A new Kernel Density Estimation-based algorithm for clustering in large multimedia databases called DENCLUE (DENsity-based CLUstEring) is introduced, which has a firm mathematical basis and has good clustering properties in data sets with large amounts of noise.
...