A new approach to categorize continuous variables in prediction models: Proposal and validation
Data
2017-12Laburpena
When developing prediction models for application in clinical practice, health practitioners
usually categorise clinical variables that are continuous in nature. Although categorisation is not
regarded as advisable from a statistical point of view, due to loss of information and power, it
is a common practice in medical research. Consequently, providing researchers with a useful and
valid categorisation method could be a relevant issue when developing prediction models. Without
recommending categorisation of continuous predictors, our aim is to propose a valid way to do it
whenever it is considered necessary by clinical researchers. This paper focuses on categorising a
continuous predictor within a logistic regression model, in such a way that the best discriminative
ability is obtained in terms of the highest area under the receiver operating characteristic curve
(AUC). The proposed methodology is validated when the optimal cut points' location is known in
theory or in practice. In addition, the proposed method is applied to a real data set of patients
with an exacerbation of chronic obstructive pulmonary disease, in the context of the IRYSS-COPD
study where a clinical prediction rule for severe evolution was being developed. The clinical variable
PCO2 was categorised in a univariable and a multivariable setting.