Identification of breast cancer molecular subtypes
Haibe-Kains B
Introduction : Since the advent
of array-based technology and the sequencing of the human genome,
scientists attempted to bring new insights into breast cancer biology
and prognosis. From gene expression data, Perou et al. highlighted the
key molecular differences between breast tumors by identifying sets of
co-expressed genes and tumors sharing similar « genetic portraits »
(Perou et al, Nature 2000). Using a hierarchical clustering method in
combination with a large set of genes (called the « intrinsic gene list
» in the literature), several subtypes were identified based mainly on
ER and HER2 phenotypes and proliferation. Although these early results
were promising, the clustering model developed in the original
publications suffers from serious drawbacks, i.e. its instability and
the difficulty to apply it to new data (Pusztai et al., The Oncologist
2006).
Methods : In order to address
these concerns, we recently introduced a new clustering model to
robustly identify the breast cancer molecular subtypes. This model
consists in : (i) identifying gene modules, i.e. sets of genes that are
specifically co-expressed with genes of interest; and (ii) identifying
molecular subtypes using a simple model-based clustering in a low
dimensional space defined by these gene modules (Wirapati et al., BCR
2008; Desmedt et al. CCR 2008).
Results : From two large
microarray datasets (> 600 patients), seven gene modules were built
in order to represent key biological processes in breast cancer : ER
phénotype (ESR1), HER2 phenotype (ERBB2), proliferation (AURKA),
immune response (STAT1), angiogenesis (VEGF), tumor invasion (PLAU) and
apoptosis (CASP3). Since previous publications highlighted the
relevance of ER and HER2 phenotypes for breast cancer subtypes
identification, recently confirmed by (Kapp et al., BMC Genomics 2006),
we used the ESR1 and ERBB2 module scores to fit our model-based
clustering. The model was built on a series of 344 breast cancer
patients. The resulting classification was shown to be robust in a set
of 14 independent microarray datasets, including > 2700 patients.
Conclusion : This method has
several advantages compared to the previously published hierarchical
clustering: (i) the low-dimensionality of the input space (two
dimensions) increases the stability of the clustering and facilitates
the visualization of the clustering results; (ii) the low computational
cost; (iii) the model is easily applicable to new data; (iv) the model
returns probabilities for a patient to belong to each subtype,
facilitating the interpretation of the results. Moreover, this novel
clustering model yields robust classifications in numerous microarray
datasets. Given its easy applicability and its good performance, this
new model could be used by doctors in order to study the prognosis and
the effect of treatments with respect to the molecular subtypes of
breast cancer.