Variable selection in high-dimensional data: application in a SARS-CoV-2 pneumonia clinical data-set
As a result of the COVID-19 pandemic that collapsed hospitals in some countries, numerous studies have been carried out to understand the development of the disease and how it affects patients with different characteristics, in order to make optimal use of the available resources. This project is part of a multicentre study that aims to predict the severity of patients with SARS-CoV-2 pneumonia, for which different variables related to health, demographic and socio-economic factors and exposure to pollutants of patients have been collected. Given the number of variables contained in the data-set, it is necessary to reduce the number of variables in order to create a practical model for interpretation, as well as to reduce the amount of information that doctors have to collect on each patient. In this project, an exhaustive analysis of variable or feature selection techniques has been carried out in order to determine their performance and relevance in terms of stability, similarity and computation time. Based on the techniques that have shown the best characteristics, the most meaningful factors in preventing the severity of pneumonia have been identified, in accordance with what has been proposed by other studies.