Attraction-repulsion clustering: a way of promoting diversity linked to demographic parity in fair clustering
MetadataShow full item record
We consider the problem of diversity enhancing clustering, i.e, developing clustering methods which produce clusters that favour diversity with respect to a set of protected attributes such as race, sex, age, etc. In the context of fair clustering, diversity plays a major role when fairness is understood as demographic parity. To promote diversity, we introduce perturbations to the distance in the unprotected attributes that account for protected attributes in a way that resembles attraction-repulsion of charged particles in Physics. These perturbations are defined through dissimilarities with a tractable interpretation. Cluster analysis based on attraction-repulsion dissimilarities penalizes homogeneity of the clusters with respect to the protected attributes and leads to an improvement in diversity. An advantage of our approach, which falls into a pre-processing set-up, is its compatibility with a wide variety of clustering methods and whit non-Euclidean data. We illustrate the use of our procedures with both synthetic and real data and provide discussion about the relation between diversity, fairness, and cluster structure.
Except where otherwise noted, this item's license is described as Reconocimiento-NoComercial-CompartirIgual 3.0 España
Mostrando ítems relacionados por Título, autor o materia.
Capó, M.; Pérez, A.; Lozano, J.A. (2023-07-26)K-medoids clustering is one of the most popular techniques in exploratory data analysis. The most commonly used algorithms to deal with this problem are quadratic on the number of instances, n, and usually the quality of ...
Landa-Torres, I.; Del Ser, J.; Manjarres, D.; Gil-Lopez, S.; Salcedo-Sanz, S. (2017)The problem of partitioning a data set into disjoint groups or clusters of related items plays a key role in data analytics, in particular when the information retrieval becomes crucial for further data analysis. In this ...
Capo, M.; Pérez, A.; Lozano, J.A. (2017-02-01)Due to the progressive growth of the amount of data available in a wide variety of scientific fields, it has become more difficult to manipulate and analyze such information. In spite of its dependency on the initial ...