Data Science (DS)
Browse by
Recent Submissions
-
Learning a logistic regression with the help of unknown features at prediction stage
(2023)The use of features available at training time, but not at prediction time, as additional information for training models is known as learning using privileged information paradigm. In this paper, the handling of ... -
Spatio‑temporal modelling of high‑throughput phenotyping data
(2023-10-13)High throughput phenotyping (HTP) platforms and devices are increasingly used to characterise growth and developmental processes for large sets of plant genotypes. This dissertation is motivated by the need to accurately ... -
Derivative curve estimation in longitudinal studies using P-splines
(2023-09-18)The estimation of curve derivatives is of interest in many disciplines. It allows the extraction of important characteristics to gain insight about the underlying process. In the context of longitudinal data, the derivative ... -
A revisited branch-and-cut algorithm for large-scale orienteering problems
(2024-02-16)The orienteering problem is a route optimization problem which consists of finding a simple cycle that maximizes the total collected profit subject to a maximum distance limitation. In the last few decades, the occurrence ... -
A kernel-enriched order-dependent nonparametric spatio-temporal process
(2023)Spatio-temporal processes are necessary modeling tools for various environmental, biological, and geographical problems. The underlying model is commonly considered to be parametric and to be a Gaussian process. Additionally, ... -
Female Models in AI and the Fight Against COVID-19
(2022-11-01)Gender imbalance has persisted over time and is well documented in science, technology, engineering and mathematics (STEM) and singularly in artificial intelligence (AI). In this article we emphasize the importance of ... -
Efficient Learning of Minimax Risk Classifiers in High Dimensions
(2023-08-01)High-dimensional data is common in multiple areas, such as health care and genomics, where the number of features can be tens of thousands. In such scenarios, the large number of features often leads to inefficient ... -
Selecting the number of categories of the lymph node ratio in cancer research: A bootstrap-based hypothesis test
(2021)The high impact of the lymph node ratio as a prognostic factor is widely established in colorectal cancer, and is being used as a categorized predictor variable in several studies. However, the cut-off points as well as ... -
Clinical prediction rules for adverse evolution in patients with COVID-19 by the Omicron variant
(2023)Objective: We identify factors related to SARS-CoV-2 infection linked to hospitalization, ICU admission, and mortality and develop clinical prediction rules. Methods: Retrospective cohort study of 380,081 patients with ... -
Five-year follow-up mortality prognostic index for colorectal patients
(2023)Purpose: To identify 5-year survival prognostic variables in patients with colorectal cancer (CRC) and to propose a survival prognostic score that also takes into account changes over time in the patient's health-related ... -
Age or lifestyle-induced accumulation of genotoxicity is associated with a length-dependent decrease in gene expression
(2023)DNA damage has long been advocated as a molecular driver of aging. DNA dam- age occurs in a stochastic manner, and is therefore more likely to accumulate in longer genes. The length-dependent accumulation of transcripti ... -
Variable selection with LASSO regression for complex survey data
(2023)Variable selection is an important step to end up with good prediction models. LASSO regression models are one of the most commonly used methods for this purpose, for which cross-validation is the most widely applied ... -
Estimation of cut-off points under complex-sampling design data
(2022)In the context of logistic regression models, a cut-off point is usually selected to dichotomize the estimated predicted probabilities based on the model. The techniques proposed to estimate optimal cut-off points in the ... -
VSD-MOEA: A Dominance-Based Multiobjective Evolutionary Algorithm with Explicit Variable Space Diversity Management
(2022-06-01)Most state-of-the-art Multiobjective Evolutionary Algorithms (moeas) promote the preservation of diversity of objective function space but neglect the diversity of decision variable space. The aim of this article is to ... -
On the utilization of pair-potential energy functions in multi-objective optimization
(2023-06-01)In evolutionary multi-objective optimization (EMO), the pair-potential energy functions (PPFs) have been used to construct diversity-preserving mechanisms to improve Pareto front approximations. Despite PPFs have shown ... -
An ACO-based Hyper-heuristic for Sequencing Many-objective Evolutionary Algorithms that Consider Different Ways to Incorporate the DM's Preferences
(2023-02-01)Many-objective optimization is an area of interest common to researchers, professionals, and practitioners because of its real-world implications. Preference incorporation into Multi-Objective Evolutionary Algorithms (MOEAs) ... -
LASSO for streaming data with adaptative filtering
(2022)Streaming data is ubiquitous in modern machine learning, and so the development of scalable algorithms to analyze this sort of information is a topic of current interest. On the other hand, the problem of l1-penalized ... -
Are the statistical tests the best way to deal with the biomarker selection problem?
(2022)Statistical tests are a powerful set of tools when applied correctly, but unfortunately the extended misuse of them has caused great concern. Among many other applications, they are used in the detection of biomarkers so ... -
On the use of the descriptive variable for enhancing the aggregation of crowdsourced labels
(2022)The use of crowdsourcing for annotating data has become a popular and cheap alternative to expert labelling. As a consequence, an aggregation task is required to combine the different labels provided and agree on a single ... -
On the relative value of weak information of supervision for learning generative models: An empirical study
(2022)Weakly supervised learning is aimed to learn predictive models from partially supervised data, an easy-to-collect alternative to the costly standard full supervision. During the last decade, the research community has ...