Selecting the number of categories of the lymph node ratio in cancer research: A bootstrap-based hypothesis test
Abstract
The high impact of the lymph node ratio as a prognostic factor is widely established in colorectal cancer, and is being
used as a categorized predictor variable in several studies. However, the cut-off points as well as the number of
categories considered differ considerably in the literature. Motivated by the need to obtain the best categorization
of the lymph node ratio as a predictor of mortality in colorectal cancer patients, we propose a method to select the best
number of categories for a continuous variable in a logistic regression framework. Thus, to this end, we propose a
bootstrap-based hypothesis test, together with a new estimation algorithm for the optimal location of the cut-off points
called BackAddFor, which is an updated version of the previously proposed AddFor algorithm. The performance of the
hypothesis test was evaluated by means of a simulation study, under different scenarios, yielding type I errors close to
the nominal errors and good power values whenever a meaningful difference in terms of prediction ability existed.
Finally, the methodology proposed was applied to the CCR-CARESS study where the lymph node ratio was included as a
predictor of five-year mortality, resulting in the selection of three categories.