Data Science (DS)
Browse by
Recent Submissions

Learning a logistic regression with the help of unknown features at prediction stage
(2023)The use of features available at training time, but not at prediction time, as additional information for training models is known as learning using privileged information paradigm. In this paper, the handling of ... 
Spatio‑temporal modelling of high‑throughput phenotyping data
(20231013)High throughput phenotyping (HTP) platforms and devices are increasingly used to characterise growth and developmental processes for large sets of plant genotypes. This dissertation is motivated by the need to accurately ... 
Derivative curve estimation in longitudinal studies using Psplines
(20230918)The estimation of curve derivatives is of interest in many disciplines. It allows the extraction of important characteristics to gain insight about the underlying process. In the context of longitudinal data, the derivative ... 
A revisited branchandcut algorithm for largescale orienteering problems
(20240216)The orienteering problem is a route optimization problem which consists of finding a simple cycle that maximizes the total collected profit subject to a maximum distance limitation. In the last few decades, the occurrence ... 
A kernelenriched orderdependent nonparametric spatiotemporal process
(2023)Spatiotemporal processes are necessary modeling tools for various environmental, biological, and geographical problems. The underlying model is commonly considered to be parametric and to be a Gaussian process. Additionally, ... 
Female Models in AI and the Fight Against COVID19
(20221101)Gender imbalance has persisted over time and is well documented in science, technology, engineering and mathematics (STEM) and singularly in artificial intelligence (AI). In this article we emphasize the importance of ... 
Efficient Learning of Minimax Risk Classifiers in High Dimensions
(20230801)Highdimensional data is common in multiple areas, such as health care and genomics, where the number of features can be tens of thousands. In such scenarios, the large number of features often leads to inefficient ... 
Selecting the number of categories of the lymph node ratio in cancer research: A bootstrapbased hypothesis test
(2021)The high impact of the lymph node ratio as a prognostic factor is widely established in colorectal cancer, and is being used as a categorized predictor variable in several studies. However, the cutoff points as well as ... 
Clinical prediction rules for adverse evolution in patients with COVID19 by the Omicron variant
(2023)Objective: We identify factors related to SARSCoV2 infection linked to hospitalization, ICU admission, and mortality and develop clinical prediction rules. Methods: Retrospective cohort study of 380,081 patients with ... 
Fiveyear followup mortality prognostic index for colorectal patients
(2023)Purpose: To identify 5year survival prognostic variables in patients with colorectal cancer (CRC) and to propose a survival prognostic score that also takes into account changes over time in the patient's healthrelated ... 
Age or lifestyleinduced accumulation of genotoxicity is associated with a lengthdependent decrease in gene expression
(2023)DNA damage has long been advocated as a molecular driver of aging. DNA dam age occurs in a stochastic manner, and is therefore more likely to accumulate in longer genes. The lengthdependent accumulation of transcripti ... 
Variable selection with LASSO regression for complex survey data
(2023)Variable selection is an important step to end up with good prediction models. LASSO regression models are one of the most commonly used methods for this purpose, for which crossvalidation is the most widely applied ... 
Estimation of cutoff points under complexsampling design data
(2022)In the context of logistic regression models, a cutoff point is usually selected to dichotomize the estimated predicted probabilities based on the model. The techniques proposed to estimate optimal cutoff points in the ... 
VSDMOEA: A DominanceBased Multiobjective Evolutionary Algorithm with Explicit Variable Space Diversity Management
(20220601)Most stateoftheart Multiobjective Evolutionary Algorithms (moeas) promote the preservation of diversity of objective function space but neglect the diversity of decision variable space. The aim of this article is to ... 
On the utilization of pairpotential energy functions in multiobjective optimization
(20230601)In evolutionary multiobjective optimization (EMO), the pairpotential energy functions (PPFs) have been used to construct diversitypreserving mechanisms to improve Pareto front approximations. Despite PPFs have shown ... 
An ACObased Hyperheuristic for Sequencing Manyobjective Evolutionary Algorithms that Consider Different Ways to Incorporate the DM's Preferences
(20230201)Manyobjective optimization is an area of interest common to researchers, professionals, and practitioners because of its realworld implications. Preference incorporation into MultiObjective Evolutionary Algorithms (MOEAs) ... 
LASSO for streaming data with adaptative filtering
(2022)Streaming data is ubiquitous in modern machine learning, and so the development of scalable algorithms to analyze this sort of information is a topic of current interest. On the other hand, the problem of l1penalized ... 
Are the statistical tests the best way to deal with the biomarker selection problem?
(2022)Statistical tests are a powerful set of tools when applied correctly, but unfortunately the extended misuse of them has caused great concern. Among many other applications, they are used in the detection of biomarkers so ... 
On the use of the descriptive variable for enhancing the aggregation of crowdsourced labels
(2022)The use of crowdsourcing for annotating data has become a popular and cheap alternative to expert labelling. As a consequence, an aggregation task is required to combine the different labels provided and agree on a single ... 
On the relative value of weak information of supervision for learning generative models: An empirical study
(2022)Weakly supervised learning is aimed to learn predictive models from partially supervised data, an easytocollect alternative to the costly standard full supervision. During the last decade, the research community has ...