Data Science (DS)http://hdl.handle.net/20.500.11824/92019-03-18T05:35:40Z2019-03-18T05:35:40ZMallows and generalized Mallows model for matchingsIrurozki E.Calvo B.Lozano J.A.http://hdl.handle.net/20.500.11824/9422019-03-08T02:00:10Z2019-02-25T00:00:00ZMallows and generalized Mallows model for matchings
Irurozki E.; Calvo B.; Lozano J.A.
The Mallows and Generalized Mallows Models are two of the most popular probability models for distribu- tions on permutations. In this paper, we consider both models under the Hamming distance. This models can be seen as models for matchings instead of models for rankings. These models cannot be factorized, which contrasts with the popular MM and GMM under Kendall’s-τ and Cayley distances. In order to overcome the computational issues that the models involve, we introduce a novel method for computing the partition function. By adapting this method we can compute the expectation, joint and conditional probabilities. All these methods are the basis for three sampling algorithms, which we propose and analyze. Moreover, we also propose a learning algorithm. All the algorithms are analyzed both theoretically and empirically, using synthetic and real data from the context of e-learning and Massive Open Online Courses (MOOC).
2019-02-25T00:00:00Zperm mateda: A matlab toolbox of estimation of distribution algorithms for permutation-based combinatorial optimization problemsIrurozki E.Ceberio J.Santamaria J.Santana R.Mendiburu A.http://hdl.handle.net/20.500.11824/9232019-02-13T02:00:19Z2018-01-01T00:00:00Zperm mateda: A matlab toolbox of estimation of distribution algorithms for permutation-based combinatorial optimization problems
Irurozki E.; Ceberio J.; Santamaria J.; Santana R.; Mendiburu A.
Permutation problems are combinatorial optimization problems whose solutions are naturally codified as permutations. Due to their complexity, motivated principally by the factorial cardinality of the search space of solutions, they have been a recurrent topic for the artificial intelligence and operations research community. Recently, among the vast number of metaheuristic algorithms, new advances on estimation of distribution algorithms (EDAs) have shown outstanding performance when solving some permutation problems. These novel EDAs implement distance-based exponential probability models such as the Mallows and Generalized Mallows models. In this paper, we present a Matlab package, perm mateda, for estimation of distribution algorithms on permutation problems, which has been implemented as an extension to the Mateda-2.0 toolbox of EDAs. Particularly, we provide implementations of the Mallows and Generalized Mallows EDAs under the Kendall’s-τ, Cayley, and Ulam distances. In addition, four classical permutation problems have been also implemented: Traveling Salesman Problem, Permutation Flowshop Scheduling Problem, Linear Ordering Problem, and Quadratic Assignment Problem.
2018-01-01T00:00:00ZAggregated outputs by linear models: An application on marine litter beaching predictionHernández-González J.Inza I.Granado I.Basurko O.C.Fernández J.A.Lozano J.A.http://hdl.handle.net/20.500.11824/9062019-01-06T02:00:09Z2019-01-01T00:00:00ZAggregated outputs by linear models: An application on marine litter beaching prediction
Hernández-González J.; Inza I.; Granado I.; Basurko O.C.; Fernández J.A.; Lozano J.A.
In regression, a predictive model which is able to anticipate the output of a new case is learnt from a set of previous examples. The output or response value of these examples used for model training is known. When learning with aggregated outputs, the examples available for model training are individually unlabeled. Collectively, the aggregated outputs of different subsets of training examples are provided. In this paper, we propose an iterative methodology to learn linear models from this type of data. In spite of being simple, its competitive performance is shown in comparison with a straightforward solution and state-of-the-art techniques. A real world problem is also illustrated which naturally fits the aggregated outputs framework: the estimation of marine litter beaching along the south-east coastline of the Bay of Biscay.
2019-01-01T00:00:00ZA beta-binomial mixed-effects model approach for analysing longitudinal discrete and bounded outcomesNajera-Zuloaga J.Lee D.-J.Arostegui I.http://hdl.handle.net/20.500.11824/9032018-12-18T02:00:11Z2018-05-01T00:00:00ZA beta-binomial mixed-effects model approach for analysing longitudinal discrete and bounded outcomes
Najera-Zuloaga J.; Lee D.-J.; Arostegui I.
Patient-reported outcomes (PROs) are currently being increasingly used as primary outcome measures in observational and experimental studies since they inform clinicians and researchers about the health-status of patients and generate data to facilitate improved care. PROs usually appear as discrete and bounded with U, J or inverse J-shapes and hence, exponential family members offer inadequate distributional fits. The beta-binomial distribution has been proposed in the literature to fit PROs. However, the fact that the beta-binomial distribution does not belong to the exponential family limits its applicability in the regression model context, and classical estimation approaches are not straightforward. Moreover, PROs are usually measured in a longitudinal framework in which individuals are followed up for a certain period. Hence, each individual obtains several scores of the PRO over time, which leads to the repeated-measures and defines the correlation structure in the data. In this work, we have developed and proposed an estimation procedure for the analysis of correlated discrete and bounded outcomes, particularly PROs, by a beta-binomial mixed-effects model. Additionally, we have implemented the methodology in the PROreg package in R. Because there are similar approaches in the literature to address the same issue, this work also incorporates a comparison study between our proposal and alternative methodologies commonly implemented in R and shows the superior performance of our estimation procedure. This paper was motivated by the analysis of the health-status of patients with chronic obstructive pulmonary disease, where the main objective is the assessment of risk factors that may affect the evolution of the disease. The application of the proposed approach in the study leads to clinically relevant results.
2018-05-01T00:00:00Z