Machine Learning
http://hdl.handle.net/20.500.11824/12
2020-07-15T12:13:04ZAn adaptive neuroevolution-basedhyperheuristic
http://hdl.handle.net/20.500.11824/1115
An adaptive neuroevolution-basedhyperheuristic
Etor A.; Ceberio J.; Pérez A.; Irurozki E.
According to the No-Free-Lunch theorem, an algorithm that performs efficiently on any type of problem does not exist. In this sense, algorithms that exploit problem-specific knowledge usually outperform more generic approaches, at the cost of a more complex design and parameter tuning process. Trying to combine the best of both worlds, the field of hyperheuristics investigates the automatized generation and hybridization of heuristic algorithms.
In this paper, we propose a neuroevolution-based hyperheuristic approach. Particularly, we develop a population-based hyperheuristic algorithm that first trains a neural network on an instance of a problem and then uses the trained neural network to control how and which low-level operators are applied to each of the solutions when optimizing different problem instances. The trained neural network maps the state of the optimization process to the operations to be applied to the solutions in the population at each generation.
2020-07-08T00:00:00ZOptimization of deep learning precipitation models using categorical binary metrics
http://hdl.handle.net/20.500.11824/1106
Optimization of deep learning precipitation models using categorical binary metrics
Larraondo P.R.; Renzullo L.J.; Van Dijk A.I.J.M.; Inza I.; Lozano J.A.
This work introduces a methodology for optimizing neural network models using a combination of continuous and categorical binary indices in the context of precipitation forecasting. Probability of detection or false alarm rate are popular metrics used in the verification of precipitation models. However, machine learning models trained using gradient descent cannot be optimized based on these metrics, as they are not differentiable. We propose an alternative formulation for these categorical indices that are differentiable and we demonstrate how they can be used to optimize the skill of precipitation neural network models defined as a multi-objective optimization problem. To our knowledge, this is the first proposal of a methodology for optimizing weather neural network models based on categorical indices.
2020-01-01T00:00:00ZAn efficient K-means clustering algorithm for tall data
http://hdl.handle.net/20.500.11824/1099
An efficient K-means clustering algorithm for tall data
Capo M.; Pérez A.; Lozano J.A.
The analysis of continously larger datasets is a task of major importance in a wide variety of scientific fields. Therefore, the development of efficient and parallel algorithms to perform such an analysis is a a crucial topic in unsupervised learning. Cluster analysis algorithms are a key element of exploratory data analysis and, among them, the K-means algorithm stands out as the most popular approach due to its easiness in the implementation, straightforward parallelizability and relatively low computational cost. Unfortunately, the K-means algorithm also has some drawbacks that have been extensively studied, such as its high dependency on the initial conditions, as well as to the fact that it might not scale well on massive datasets. In this article, we propose a recursive and parallel approximation to the K-means algorithm that scales well on the number of instances of the problem, without affecting the quality of the approximation. In order to achieve this, instead of analyzing the entire dataset, we work on small weighted sets of representative points that are distributed in such a way that more importance is given to those regions where it is harder to determine the correct cluster assignment of the original instances. In addition to different theoretical properties, which explain the reasoning behind the algorithm, experimental results indicate that our method outperforms the state-of-the-art in terms of the trade-off between number of distance computations and the quality of the solution obtained.
2020-01-01T00:00:00ZSupervised non-parametric discretization based on Kernel density estimation
http://hdl.handle.net/20.500.11824/1091
Supervised non-parametric discretization based on Kernel density estimation
Flores J. L.; Calvo B.; Pérez A.
Nowadays, machine learning algorithms can be found in many applications where the classifiers play a key role. In this context, discretizing continuous attributes is a common step previous to classification tasks, the main goal being to retain as much discriminative information as possible. In this paper, we propose a supervised univariate non-parametric discretization algorithm which allows the use of a given supervised score criterion for selecting the best cut points. The candidate cut points are evaluated by computing the selected score value using kernel density estimation. The computational complexity of the proposed procedure is O(N log N), where N is the length of the data. Our proposed algorithm generates a low complexity in discretization policies while retaining the discriminative information of the original continuous variables. In order to assess the validity of the proposed method, a set of real and artificial datasets has been used and the results show that the algorithm provides competitive results in terms of performance, a low complexity in the discretization policies and a high performance.
2019-12-19T00:00:00Z