Derivation of a Cost-Sensitive COVID-19 Mortality Risk Indicator Using a Multistart Framework
The overall global death rate for COVID-19 patients has escalated to 2.13% after more than a year of worldwide spread. Despite strong research on the infection pathogenesis, the molecular mechanisms involved in a fatal course are still poorly understood.Machine learning constitutes a perfect tool to develop algorithms for predicting a patient’s hospitalization outcome at triage. This paper presents a probabilistic model, referred to as a mortality risk indicator, able to assess the risk of a fatal outcome for new patients. The derivation of the model was done over a database of 2,547 patients from the first COVID-19 wave in Spain. Model learning was tackled through a five multistart configuration that guaranteed good generalization power and low variance error estimators. The training algorithm made use of a class weighting correction to account for the mortality class imbalance and two regularization learners, logistic and lasso regressors. Outcome probabilities were adjusted to obtain cost-sensitive predictions by minimizing the type II error. Our mortality indicator returns both a binary outcome and a threestage mortality risk level. The estimated AUC across multistarts reaches an average of 0.907. At the optimal cutoff for the binary outcome, the model attains an average sensitivity of 0.898, with a 0.745 specificity. An independent set of 121 patients later released from the same consortium attained perfect sensitivity (1), with a 0.759 specificity when predicted by our model. Best performance for the indicator is achieved when the prediction’s time horizon is within two weeks since admission to hospital. In addition to a strong predictive performance, the set of selected features highlights the relevance of several underrated molecules in COVID-19 research, such as blood eosinophils, bilirubin, and urea levels.