Show simple item record

dc.contributor.authorBidaurrazaga, A. 
dc.contributor.authorPérez, A. 
dc.contributor.authorCapó, M.
dc.description.abstractNowadays, streaming data analysis has become a relevant area of research in machine learning. Most of the data streams available are unlabeled, and thus it is necessary to develop specific clustering techniques that take into account the particularities of the streaming data. In streaming data scenarios, the data is composed of an increasing sequence of batches of samples where the concept drift phenomenon may occur. In this work, we formally define the streaming K -means (SKM) problem, which implies a restart of the error function when a concept drift occurs. An approximated error function that does not rely on concept drift detection is proposed. We prove that such a surrogate is a good approximation of the SKM error. Then, we introduce an algorithm to deal with SKM problem by minimizing the surrogate error function each time a new batch arrives. Alternative initialization criteria are presented and theoretically analyzed for streaming data scenarios. Among them, we develop and analyze theoretically two initialization methods that search for the best trade-off between the importance that is given to the past and the current batches. The experiments show that the proposed algorithm with, the proposed initialization criteria, obtain the best results when dealing with the SKM problem without requiring to detect when concept drift takes place.en_US
dc.rightsReconocimiento-NoComercial-CompartirIgual 3.0 Españaen_US
dc.subjectStreaming Dataen_US
dc.titleK-means for Evolving Data Streamsen_US
dc.journal.titleProceedings - IEEE International Conference on Data Mining, ICDMen_US

Files in this item


This item appears in the following Collection(s)

Show simple item record

Reconocimiento-NoComercial-CompartirIgual 3.0 España
Except where otherwise noted, this item's license is described as Reconocimiento-NoComercial-CompartirIgual 3.0 España