Data Science (DS)http://hdl.handle.net/20.500.11824/92023-04-23T05:26:42Z2023-04-23T05:26:42ZLearning the progression patterns of treatments using a probabilistic generative modelZaballa, O.Pérez, A.Gómez-Inhiesto, E.Acaiturri-Ayesta, T.Lozano, J.A.http://hdl.handle.net/20.500.11824/15712023-04-21T08:41:00Z2022-12-15T00:00:00ZLearning the progression patterns of treatments using a probabilistic generative model
Zaballa, O.; Pérez, A.; Gómez-Inhiesto, E.; Acaiturri-Ayesta, T.; Lozano, J.A.
Modeling a disease or the treatment of a patient has drawn much attention in recent years due to the vast amount of information that Electronic Health Records contain. This paper presents a probabilistic generative model of treatments that are described in terms of sequences of medical activities of variable length. The main objective is to identify distinct subtypes of treatments for a given disease, and discover their development and progression. To this end, the model considers that a sequence of actions has an associated hierarchical structure of latent variables that both classifies the sequences based on their evolution over time, and segments the sequences into different progression stages. The learning procedure of the model is performed with the Expectation–Maximization algorithm which considers the exponential number of configurations of the latent variables and is efficiently solved with a method based on dynamic programming. The evaluation of the model is twofold: first, we use synthetic data to demonstrate that the learning procedure allows the generative model underlying the data to be recovered; we then further assess the potential of our model to provide treatment classification and staging information in real-world data. Our model can be seen as a tool for classification, simulation, data augmentation and missing data imputation.
2022-12-15T00:00:00ZUncertainty-wise software anti-patterns detection: A possibilistic evolutionary machine learning approachBoutaib, S.Elarbi, M.Bechikh, S.Coello, C.A.Said, L. B.http://hdl.handle.net/20.500.11824/15662023-04-21T08:40:29Z2022-11-01T00:00:00ZUncertainty-wise software anti-patterns detection: A possibilistic evolutionary machine learning approach
Boutaib, S.; Elarbi, M.; Bechikh, S.; Coello, C.A.; Said, L. B.
Context: Code smells (a.k.a. anti-patterns) are manifestations of poor design solutions that can deteriorate software maintainability and evolution. Research gap: Existing works did not take into account the issue of uncertain class labels, which is an important inherent characteristic of the smells detection problem. More precisely, two human experts may have different degrees of uncertainty about the smelliness of a particular software class not only for the smell detection task but also for the smell type identification one. Unluckily, existing approaches usually reject and/or ignore uncertain data that correspond to software classes (i.e. dataset instances) with uncertain labels. Throwing away and/or disregarding the uncertainty factor could considerably degrade the detection/identification process effectiveness. From a solution approach viewpoint, there is no work in the literature that proposed a method that is able to detect and/or identify code smells while preserving the uncertainty aspect. Objective: The main goal of our research work is to handle the uncertainty factor, issued from human experts, in detecting and/or identifying code smells by proposing an evolutionary approach that is able to deal with anti-patterns classification with uncertain labels. Method: We suggest Bi-ADIPOK, as an effective search-based tool that is capable to tackle the previously mentioned challenge for both detection and identification cases. The proposed method corresponds to an EA (Evolutionary Algorithm) that optimizes a set of detectors encoded as PK-NNs (Possibilistic K-nearest neighbors) based on a bi-level hierarchy, in which the upper level role consists on finding the optimal PK-NNs parameters, while the lower level one is to generate the PK-NNs. A newly fitness function has been proposed fitness function PomAURPC-OVA_dist (Possibilistic modified Area Under Recall Precision Curve One-Versus-All_distance, abbreviated PAURPC_d in this paper). Bi-ADIPOK is able to deal with label uncertainty using some concepts stemming from the Possibility Theory. Furthermore, the PomAURPC-OVA_dist is capable to process the uncertainty issue even with imbalanced data. We notice that Bi-ADIPOK is first built and then validated using a possibilistic base of smell examples that simulates and mimics the subjectivity of software engineers opinions. Results: The statistical analysis of the obtained results on a set of comparative experiments with respect to four relevant state-of-the-art methods shows the merits of our proposal. The obtained detection results demonstrate that, for the uncertain environment, the PomAURPC-OVA_dist of Bi-ADIPOK ranges between 0.902 and 0.932 and its IAC lies between 0.9108 and 0.9407, while for the certain environment, the PomAURPC-OVA_dist lies between 0.928 and 0.955 and the IAC ranges between 0.9477 and 0.9622. Similarly, the identification results, for the uncertain environment, indicate that the PomAURPC-OVA_dist of Bi-ADIPOK varies between 0.8576 and 0.9273 and its IAC is between 0.8693 and 0.9318. For the certain environment, the PomAURPC-OVA_dist lies between 0.8613 and 0.9351 and the IAC values are between 0.8672 and 0.9476. With uncertain data, Bi-ADIPOK can find 35% more code smells than the second best approach (i.e., BLOP). Furthermore, Bi-ADIPOK has succeeded to reduce the number of false alarms (i.e., misclassified smelly instances) by 12%. In addition, our proposed approach can identify 43% more smell types than BLOP and reduces the number of false alarms by 32%. The same results have been obtained for the certain environment, demonstrating Bi-ADIPOK's ability to deal with such environment.
2022-11-01T00:00:00ZOn the Construction of Pareto-Compliant Combined IndicatorsFalcón-Cardona, J.G.Emmerich, M.Coello, C.A.http://hdl.handle.net/20.500.11824/15652023-04-21T08:40:31Z2022-08-12T00:00:00ZOn the Construction of Pareto-Compliant Combined Indicators
Falcón-Cardona, J.G.; Emmerich, M.; Coello, C.A.
The most relevant property that a quality indicator (QI) is expected to have is Pareto compliance, which means that every time an approximation set strictly dominates another in a Pareto sense, the indicator must reflect this. The hypervolume indicator and its variants are the only unary QIs known to be Pareto-compliant but there are many commonly used weakly Pareto-compliant indicators such as R2, IGD+,andɛ+. Currently, an open research area is related to finding new Pareto-compliant indicators whose preferences are different from those of the hypervolume indicator. In this article, we propose a theoretical basis to combine existing weakly Pareto-compliant indicators with at least one being Pareto-compliant, such that the resulting combined indicator is Pareto-compliant as well. Most importantly, we show that the combination of Paretocompliant QIs with weakly Pareto-compliant indicators leads to indicators that inherit properties of the weakly compliant indicators in terms of optimal point distributions. The consequences of these new combined indicators are threefold: (1) to increase the variety of available Pareto-compliant QIs by correcting weakly Pareto-compliant indicators, (2) to introduce a general framework for the combination of QIs, and (3) to generate new selection mechanisms for multiobjective evolutionary algorithms where it is possible to achieve/adjust desired distributions on the Pareto front.
2022-08-12T00:00:00ZA convergence and diversity guided leader selection strategy for many-objective particle swarm optimizationLi, L.Li, Y.Lin, Q.Ming, Z.Coello, C.A.http://hdl.handle.net/20.500.11824/15642023-04-21T08:40:29Z2022-10-01T00:00:00ZA convergence and diversity guided leader selection strategy for many-objective particle swarm optimization
Li, L.; Li, Y.; Lin, Q.; Ming, Z.; Coello, C.A.
Recently, particle swarm optimizer (PSO) is extended to solve many-objective optimization problems (MaOPs) and becomes a hot research topic in the field of evolutionary computation. Particularly, the leader particle selection (LPS) and the search direction used in a velocity update strategy are two crucial factors in PSOs. However, the LPS strategies for most existing PSOs are not so efficient in high-dimensional objective space, mainly due to the lack of convergence pressure or loss of diversity. In order to address these two issues and improve the performance of PSO in high-dimensional objective space, this paper proposes a convergence and diversity guided leader selection strategy for PSO, denoted as CDLS, in which different leader particles are adaptively selected for each particle based on its corresponding situation of convergence and diversity. In this way, a good tradeoff between the convergence and diversity can be achieved by CDLS. To verify the effectiveness of CDLS, it is embedded into the PSO search process of three well-known PSOs. Furthermore, a new variant of PSO combining with the CDLS strategy, namely PSO/CDLS, is also presented. The experimental results validate the superiority of our proposed CDLS strategy and the effectiveness of PSO/CDLS, when solving numerous MaOPs with regular and irregular Pareto fronts (PFs).
2022-10-01T00:00:00Z