Show simple item record

dc.contributor.authorCapo, M.
dc.contributor.authorPerez, A.
dc.contributor.authorLozano, J.A. 
dc.date.accessioned2021-06-17T18:59:25Z
dc.date.available2021-06-17T18:59:25Z
dc.date.issued2021-05
dc.identifier.issn2162-2388
dc.identifier.urihttp://hdl.handle.net/20.500.11824/1298
dc.description.abstractThe increase in the number of features that need to be analyzed in a wide variety of areas, such as genome sequencing, computer vision or sensor networks, represents a challenge for the K-means algorithm. In this regard, different dimensionality reduction approaches for the K-means algorithm have been designed recently, leading to algorithms that have proved to generate competitive clusterings. Unfortunately, most of these techniques tend to have fairly high computational costs and/or might not be easy to parallelize. In this work, we propose a fully-parellelizable feature selection technique intended for the K-means algorithm. The proposal is based on a novel feature relevance measure that is closely related to the K-means error of a given clustering. Given a disjoint partition of the features, the technique consists of obtaining a clustering for each subset of features and selecting the m features with the highest relevance measure. The computational cost of this approach is just O(m · max{n · K, log m}) per subset of features. We additionally provide a theoretical analysis on the quality of the obtained solution via our proposal, and empirically analyze its performance with respect to well-known feature selection and feature extraction techniques. Such an analysis shows that our proposal consistently obtains results with lower K-means error than all the considered feature selection techniques: Laplacian scores, maximum variance, multi-cluster feature selection and random selection, while also requiring similar or lower computational times than these approaches. Moreover, when compared to feature extraction techniques, such as Random Projections, the proposed approach also shows a noticeable improvement in both error and computational time.en_US
dc.description.sponsorshipBERC 2014-2017en_US
dc.formatapplication/pdfen_US
dc.language.isoengen_US
dc.rightsReconocimiento-NoComercial-CompartirIgual 3.0 Españaen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/3.0/es/en_US
dc.subjectDimensionality reductionen_US
dc.subjectK-means clusteringen_US
dc.subjectFeature selection, parallelizationen_US
dc.titleA cheap feature selection approach for the K -means algorithmen_US
dc.typeinfo:eu-repo/semantics/articleen_US
dc.relation.publisherversionhttps://ieeexplore.ieee.org/document/9127528en_US
dc.relation.projectIDES/1PE/TIN2017-82626-Ren_US
dc.relation.projectIDEUS/ELKARTEKen_US
dc.rights.accessRightsinfo:eu-repo/semantics/openAccessen_US
dc.type.hasVersioninfo:eu-repo/semantics/acceptedVersionen_US
dc.journal.titleIEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMSen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

Reconocimiento-NoComercial-CompartirIgual 3.0 España
Except where otherwise noted, this item's license is described as Reconocimiento-NoComercial-CompartirIgual 3.0 España