Data Science (DS)http://hdl.handle.net/20.500.11824/92019-01-07T03:04:30Z2019-01-07T03:04:30ZAggregated outputs by linear models: An application on marine litter beaching predictionHernández-González J.Inza I.Granado I.Basurko O.C.Fernández J.A.Lozano J.A.http://hdl.handle.net/20.500.11824/9062019-01-06T02:00:09Z2019-01-01T00:00:00ZAggregated outputs by linear models: An application on marine litter beaching prediction
Hernández-González J.; Inza I.; Granado I.; Basurko O.C.; Fernández J.A.; Lozano J.A.
In regression, a predictive model which is able to anticipate the output of a new case is learnt from a set of previous examples. The output or response value of these examples used for model training is known. When learning with aggregated outputs, the examples available for model training are individually unlabeled. Collectively, the aggregated outputs of different subsets of training examples are provided. In this paper, we propose an iterative methodology to learn linear models from this type of data. In spite of being simple, its competitive performance is shown in comparison with a straightforward solution and state-of-the-art techniques. A real world problem is also illustrated which naturally fits the aggregated outputs framework: the estimation of marine litter beaching along the south-east coastline of the Bay of Biscay.
2019-01-01T00:00:00ZA beta-binomial mixed-effects model approach for analysing longitudinal discrete and bounded outcomesNajera-Zuloaga J.Lee D.-J.Arostegui I.http://hdl.handle.net/20.500.11824/9032018-12-18T02:00:11Z2018-05-01T00:00:00ZA beta-binomial mixed-effects model approach for analysing longitudinal discrete and bounded outcomes
Najera-Zuloaga J.; Lee D.-J.; Arostegui I.
Patient-reported outcomes (PROs) are currently being increasingly used as primary outcome measures in observational and experimental studies since they inform clinicians and researchers about the health-status of patients and generate data to facilitate improved care. PROs usually appear as discrete and bounded with U, J or inverse J-shapes and hence, exponential family members offer inadequate distributional fits. The beta-binomial distribution has been proposed in the literature to fit PROs. However, the fact that the beta-binomial distribution does not belong to the exponential family limits its applicability in the regression model context, and classical estimation approaches are not straightforward. Moreover, PROs are usually measured in a longitudinal framework in which individuals are followed up for a certain period. Hence, each individual obtains several scores of the PRO over time, which leads to the repeated-measures and defines the correlation structure in the data. In this work, we have developed and proposed an estimation procedure for the analysis of correlated discrete and bounded outcomes, particularly PROs, by a beta-binomial mixed-effects model. Additionally, we have implemented the methodology in the PROreg package in R. Because there are similar approaches in the literature to address the same issue, this work also incorporates a comparison study between our proposal and alternative methodologies commonly implemented in R and shows the superior performance of our estimation procedure. This paper was motivated by the analysis of the health-status of patients with chronic obstructive pulmonary disease, where the main objective is the assessment of risk factors that may affect the evolution of the disease. The application of the proposed approach in the study leads to clinically relevant results.
2018-05-01T00:00:00ZHybridizing Cartesian Genetic Programming and Harmony Search for Adaptive Feature Construction in Supervised Learning ProblemsElola A.Del Ser J.Bilbao M.Perfecto C.Alexandre E.Salcedo-Sanz S.http://hdl.handle.net/20.500.11824/8982018-12-05T16:06:37Z2017-02-28T00:00:00ZHybridizing Cartesian Genetic Programming and Harmony Search for Adaptive Feature Construction in Supervised Learning Problems
Elola A.; Del Ser J.; Bilbao M.; Perfecto C.; Alexandre E.; Salcedo-Sanz S.
The advent of the so-called Big Data paradigm has motivated a flurry of research aimed at enhancing machine learning models by following very di- verse approaches. In this context this work focuses on the automatic con- struction of features in supervised learning problems, which differs from the conventional selection of features in that new characteristics with enhanced predictive power are inferred from the original dataset. In particular this manuscript proposes a new iterative feature construction approach based on a self-learning meta-heuristic algorithm (Harmony Search) and a solution encoding strategy (correspondingly, Cartesian Genetic Programming) suited to represent combinations of features by means of constant-length solution vectors. The proposed feature construction algorithm, coined as Adaptive Cartesian Harmony Search (ACHS), incorporates modifications that allow exploiting the estimated predictive importance of intermediate solutions and, ultimately, attaining better convergence rate in its iterative learning proce- dure. The performance of the proposed ACHS scheme is assessed and com- pared to that rendered by the state of the art in a toy example and three practical use cases from the literature. The excellent performance figures obtained in these problems shed light on the widespread applicability of the proposed scheme to supervised learning with legacy datasets composed by already refined characteristics.
2017-02-28T00:00:00ZA statistical framework for radiation dose estimation with uncertainty quantification from the γ-H2AX assayEinbeck JAinsbury EASales RBarnard SKaestle FHigueras Mhttp://hdl.handle.net/20.500.11824/8962018-12-05T16:06:27Z2018-11-28T00:00:00ZA statistical framework for radiation dose estimation with uncertainty quantification from the γ-H2AX assay
Einbeck J; Ainsbury EA; Sales R; Barnard S; Kaestle F; Higueras M
Over the last decade, the γ–H2AX focus assay, which exploits the phosphorylation of the H2AX histone following DNA double–strand–breaks, has made considerable progress towards acceptance as a reliable biomarker for exposure to ionizing radiation. While the existing literature has convincingly demonstrated a dose–response effect, and also presented approaches to dose estimation based on appropriately defined calibration curves, a more widespread practical use is still hampered by a certain lack of discussion and agreement on the specific dose–response modelling and uncertainty quantification strategies, as well as by the unavailability of implementations. This manuscript intends to fill these gaps, by stating explicitly the statistical models and techniques required for calibration curve estimation and subsequent dose estimation. Accompanying this article, a web applet has been produced which implements the discussed methods.
2018-11-28T00:00:00Z