A comparative study of
survival models for breast cancer prognostication based on
microarray data: does a single gene beat them all?
Haibe-Kains B, Desmedt C, Sotiriou C and Bontempi G
Motivation:
Survival prediction of breast cancer (BC) patients independently of
treatment, also known as prognostication, is a complex task since
clinically similar breast tumors, in addition to be molecularly
heterogeneous, may exhibit different clinical outcomes. In recent
years, the analysis of gene expression profiles by means of
sophisticated data mining tools emerged as a promising technology to
bring additional insights in BC biology and to improve the quality of
prognostication. The aim of this work is to assess quantitatively the
accuracy of prediction obtained with state-of-the-art data analysis
techniques for BC microarray data through an independent and thorough
framework.
Results: Due to the large
number of variables, the reduced amount of samples and the high degree
of noise, \revision{complex prediction methods are highly exposed to
performance degradation despite the use of cross-validation
techniques}. Our analysis shows that the most complex methods are not
significantly better than the simplest one, a univariate model relying
on a single proliferation gene. This result suggests that proliferation
might be the most relevant biological process for BC prognostication
and that the loss of interpretability deriving from the use of
overcomplex methods may be not sufficiently counterbalanced by an
improvement of the quality of prediction.
Availability: The comparison study is implemented in an R package called survcomp and is available from http://www.ulb.ac.be/di/map/bhaibeka/software/survcomp/.
Contact: bhaibeka@ulb.ac.be
Supplementary information: Supplementary Data are available at Bioinformatics online.