Derivation of a Cost-Sensitive COVID-19 Mortality Risk Indicator Using a Multistart Framework
Abstract
The overall global death rate for COVID-19 patients
has escalated to 2.13% after more than a year of worldwide
spread. Despite strong research on the infection pathogenesis, the
molecular mechanisms involved in a fatal course are still poorly
understood.Machine learning constitutes a perfect tool to develop
algorithms for predicting a patient’s hospitalization outcome at
triage. This paper presents a probabilistic model, referred to
as a mortality risk indicator, able to assess the risk of a fatal
outcome for new patients. The derivation of the model was done
over a database of 2,547 patients from the first COVID-19 wave
in Spain. Model learning was tackled through a five multistart
configuration that guaranteed good generalization power and
low variance error estimators. The training algorithm made use
of a class weighting correction to account for the mortality
class imbalance and two regularization learners, logistic and
lasso regressors. Outcome probabilities were adjusted to obtain
cost-sensitive predictions by minimizing the type II error. Our
mortality indicator returns both a binary outcome and a threestage
mortality risk level. The estimated AUC across multistarts
reaches an average of 0.907. At the optimal cutoff for the binary
outcome, the model attains an average sensitivity of 0.898, with
a 0.745 specificity. An independent set of 121 patients later
released from the same consortium attained perfect sensitivity
(1), with a 0.759 specificity when predicted by our model. Best
performance for the indicator is achieved when the prediction’s
time horizon is within two weeks since admission to hospital. In
addition to a strong predictive performance, the set of selected
features highlights the relevance of several underrated molecules
in COVID-19 research, such as blood eosinophils, bilirubin, and
urea levels.