Characterization and evaluation of similarity measures for pairs of clusterings

D. Pfitzner; Richard Leibbrandt; D. Powers

DOI:10.1007/s10115-008-0150-6
Corpus ID: 6935380

Characterization and evaluation of similarity measures for pairs of clusterings

@article{Pfitzner2009CharacterizationAE,
  title={Characterization and evaluation of similarity measures for pairs of clusterings},
  author={Darius Pfitzner and Richard Leibbrandt and David M. W. Powers},
  journal={Knowledge and Information Systems},
  year={2009},
  volume={19},
  pages={361-394},
  url={https://api.semanticscholar.org/CorpusID:6935380}
}

D. PfitznerRichard LeibbrandtD. Powers
Published in Knowledge and Information… 26 May 2009
Computer Science, Mathematics

A paradigm apparatus for the evaluation of clustering comparison techniques is introduced and the proposal of a novel clustering similarity measure, the Measure of Concordance, is proposed, showing that only MoC, Powers’s measure, Lopez and Rajski's measure and various forms of Normalised Mutual Information exhibit the desired behaviour under each of the test scenarios.

View on Springer Nature

researchgate.net

219 Citations

Highly Influential Citations

Background Citations

Methods Citations

Results Citations

Topics

Measure Of Concordance Similarity Measure Similarity Heuristic Clusterings Data Clustering Normalized Mutual Information Human Informants Cluster Analysis

The Impact of Random Models on Clustering Similarity

Alexander J. GatesYong-Yeol Ahn

Computer Science, Mathematics

bioRxiv

2017

It is demonstrated that the choice of random model can have a drastic impact on the ranking of similar clustering pairs, and the evaluation of a clustering method with respect to a random baseline; thus, the choices of random clustering model should be carefully justified.

[PDF]

Element-centric clustering comparison unifies overlaps and hierarchy

Alexander J. GatesIan B. WoodW. HetrickYong-Yeol Ahn

Computer Science

Scientific Reports

2019

This work unify the comparison of disjoint, overlapping, and hierarchically structured clusterings by proposing a new element-centric framework: elements are compared based on the relationships induced by the cluster structure, as opposed to the traditional cluster-centric philosophy.

[PDF]

A Systematic Evaluation of Clustering Algorithms Against Expert-Derived Clustering

Ioannis MikrouN. Sapidis

Engineering, Computer Science

Oper. Res. Forum

2025

This research utilizes six mechatronic products and a team of domain experts to assess how closely the results of six clustering algorithms align with expert-generated outcomes.

Towards a Classification of Binary Similarity Measures

Iván Ramírez MejiaI. Batyrshin

Computer Science, Mathematics

Mexican International Conference on Artificial…

2017

The paper proposes the method of comparative analysis of similarity measures based on the set theoretic representation of these measures and comparison of algebraic properties of these representations and shows existing relationship between results of clustering and the classification of measures by their properties.

Characterizing and Comparing External Measures for the Assessment of Cluster Analysis and Community Detection

Nejat ArınıkVincent LabatutR. Figueiredo

Computer Science

IEEE Access

2021

This work proposes a new empirical evaluation framework that is not tied to any specific measure or application, so it can be applied to any situation and is illustrated by applying it to a selection of standard measures, and can be put in practice through two concrete use cases.

[PDF]

On Using Class-Labels in Evaluation of Clusterings

Ines FärberStephan Günnemann Ludwig-Maximilians-Univesität München

Computer Science, Mathematics

2010

This work discusses common “defects” that clustering algorithms exhibit w.r.t. this evaluation, and shows them on several real world data sets of different domains along with a discussion why the detected clusters do not indicate a bad performance of the algorithm but are valid and useful results.

Understanding information theoretic measures for comparing clusterings

Hanneke van der HoefM. Warrens

Computer Science, Mathematics

Behaviormetrika

2018

It is shown that a class of normalizations of the mutual information can be decomposed into indices that contain information on the level of individual clusters that reveal that overall measures can be interpreted as summary statistics of information reflected in the individual clusters.

A review of conceptual clustering algorithms

Airel Pérez SuárezJosé Francisco Martínez TrinidadJ. A. Carrasco-Ochoa

Computer Science

Artificial Intelligence Review

2018

This work presents an overview of the most influential algorithms reported in the field of conceptual clustering, highlighting their limitations or drawbacks, and presents a taxonomy of these methods as well as a qualitative comparison of these algorithms.

An Improved Similarity-Based Clustering Algorithm for Multi-Database Mining

S. MiloudiYulin WangWenjia Ding

Computer Science

Entropy

2021

This work uses coordinate descent and back-propagation to search for the optimal clustering of the n multiple database in a way that minimizes a convex clustering quality measure L(θ) in less than (n2−n)/2 iterations.

[PDF]

A Critical Note on the Evaluation of Clustering Algorithms

Li ZhongTiantian ZhangBo Yuan

Computer Science

arXiv.org

2019

It is suggested that the applicability of existing benchmark datasets should be carefully revisited and significant efforts need to be devoted to improving the current practice of experimental evaluation of clustering algorithms to ensure an essential match between algorithms and problems.

[PDF]

A Method for Comparing Two Hierarchical Clusterings

E. FowlkesC. Mallows

Computer Science, Mathematics

1983

The derivation and use of a measure of similarity between two hierarchical clusterings, Bk, is derived from the matching matrix, [mij], formed by cutting the two hierarchical trees and counting the number of matching entries in the k clusters in each tree.

C HAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling

Qiaoming ZhuJunhui LiGuodong ZhouPeifeng LiPeide Qian

Computer Science, Mathematics

1999

A novel hierarchical clustering algorithm called C HAMELEON that measures the similarity of two clusters based on a dynamic model and can discover natural clusters that many existing state of the art clustering algorithms fail to find.

Chameleon: Hierarchical Clustering Using Dynamic Modeling

G. KarypisEui-Hong HanVipin Kumar

Computer Science, Mathematics

Computer

1999

Chameleon's key feature is that it accounts for both interconnectivity and closeness in identifying the most similar pair of clusters, which is important for dealing with highly variable clusters.

2,191

On Clustering Validation Techniques

M. HalkidiYannis BatistakisM. Vazirgiannis

Computer Science

Journal of Intelligence and Information Systems

2004

The fundamental concepts of clustering are introduced while it surveys the widely known clustering algorithms in a comparative way and the issues that are under-addressed by the recent algorithms are illustrated.

Comparing Clusterings by the Variation of Information

M. Meilă

Computer Science, Mathematics

Annual Conference Computational Learning Theory

2003

This paper proposes an information theoretic criterion for comparing two partitions, or clusterings, of the same data set, called variation of information (VI), which is positive, symmetric and obeys the triangle inequality.

Multidimensional scaling of measures of distance between partitions

P. ArabieS. Boorman

Mathematics

1973

Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions

Alexander StrehlJoydeep Ghosh

Computer Science

Journal of machine learning research

2002

This paper introduces the problem of combining multiple partitionings of a set of objects into a single consolidated clustering without accessing the features or algorithms that determined these partitionings and proposes three techniques for obtaining high-quality combiners (consensus functions).

Robust data clustering

Ana L. N. FredAnil K. Jain

Computer Science, Mathematics

2003 IEEE Computer Society Conference on Computer…

2003

It is shown that the proposed technique attempts to optimize the mutual information based criteria, although the optimality is not ensured in all situations, and experimental results show the ability of the technique to identify clusters with arbitrary shapes and sizes.

Asymmetric binary similarity measures

D. Faith

Computer Science, Mathematics

Oecologia

2004

A new coefficient, “C”, is introduced which overcomes problems and leads to homogeneous classifications in the sense described above and further general recommendations are made for the use of these coefficients in various contexts.

A General Approach to Clustering in Large Databases with Noise

Alexander HinneburgD. Keim

Computer Science

Knowledge and Information Systems

2003

A new Kernel Density Estimation-based algorithm for clustering in large multimedia databases called DENCLUE (DENsity-based CLUstEring) is introduced, which has a firm mathematical basis and has good clustering properties in data sets with large amounts of noise.

Characterization and evaluation of similarity measures for pairs of clusterings

Topics

219 Citations

The Impact of Random Models on Clustering Similarity

Element-centric clustering comparison unifies overlaps and hierarchy

A Systematic Evaluation of Clustering Algorithms Against Expert-Derived Clustering

Towards a Classification of Binary Similarity Measures

Characterizing and Comparing External Measures for the Assessment of Cluster Analysis and Community Detection

On Using Class-Labels in Evaluation of Clusterings

Understanding information theoretic measures for comparing clusterings

A review of conceptual clustering algorithms

An Improved Similarity-Based Clustering Algorithm for Multi-Database Mining

A Critical Note on the Evaluation of Clustering Algorithms

69 References

A Method for Comparing Two Hierarchical Clusterings

C HAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling

Chameleon: Hierarchical Clustering Using Dynamic Modeling

On Clustering Validation Techniques

Comparing Clusterings by the Variation of Information

Multidimensional scaling of measures of distance between partitions

Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions

Robust data clustering

Asymmetric binary similarity measures

A General Approach to Clustering in Large Databases with Noise

Related Papers