An accurate breast cancer prognosis, or breast cancer sur-
Our goal was to predict survival for each individual. We
vivability prediction, is important as it often guides the
had two approaches: predicting a discrete survival status
treatment course of action, ability to claim additional fi-
based on time since diagnosis and other features, and pre-
nancial support from the government, actions of the pa-
dicting a continuous survival time based on all features.
tient and family, and more . Predicting breast cancersurvivability is commonly done using clinical features.
TNM staging, the globally accepted standard used to de-scribe cancer, was devised more than 60 years ago and
Breast cancer sample data is made available through the
only looks at three features: size of the tumor, number of
DREAM challenge from the METABRIC data of 1,000
regional lymph nodes with cancer, and the spread of can-
breast tumor samples used in a previous study , where
cer to other parts of the body. With the advent of afford-
data origin and preprocessing is explained in detail. We
able genomic sequencing and acceleration of findings in
further process the data by discarding samples with miss-
molecular biology in the past decade, molecular features
ing values, and are thus left with 931 samples.
may be practical to improve breast cancer prognosis.
Molecular diagnostics for cancer therapy decision-
making have shown initial promising clinical results.
This has lead to a flood of published reports of signa-
There are two indicators of survival: time from breast
tures predictive of breast cancer phenotypes, and several
cancer diagnosis to last follow-up and status of the pa-
molecular diagnostic tests for cancer therapy decision-
tient (alive or dead) at last follow-up time. Survival data
making have gained regulatory approval in recent years
is right-censored, since patients may be alive at the end
[2, 3]. However, there is no consensus for the most accu-
rate computational methods and models to predict breastcancer survivability. In addition, it is unclear that incor-
porating molecular data as a complement or replacementfor traditional clinical diagnostic tools adds any value .
Gene expression is generated using molecular profiling
Therefore, it is necessary to objectively assess whether
platforms, described in full detail in another study .
genomic data currently provides value beyond traditional
The genes used as training features are narrowed to a list
of 9 suggested by the DREAM challenge and previous
To aid in efforts to solve this problem, we predicted
literature. We used two estrogen pathway genes (ER and
breast cancer survivability with machine learning tech-
PR), two human epidermal growth factor 2 receptor am-
niques as part of the DREAM Breast Cancer Prognosis
plicon genes (HER2 SNP6 and GII), and five immune
Challenge. The ultimate goal of the challenge is to objec-
response genes (CXCL10, STAT1, GBP1, GZMA, and
tively compare many computational algorithms through
providing a common training dataset in an effort to findthe best features for breast cancer prognosis. The dataset
provided contains standard clinical measurements in ad-dition to genomic information, thus allowing genomic in-
In addition, we have the following clinical annotations,
formation to be compared with standard clinical features.
the classic features used for breast cancer prognosis:
CT: chemotherapyNumber of lymph nodes found with cancer0: no nodes1: 1-3 nodes
2: 4-9 nodes3: over 9 nodes0: 0-20 mm1: 21-50 mm
2: over 50 mm3: Direct extension to chest wall or skin0: Nottingham score 3-51: Nottingham score 6-72: Nottingham score 8-9
The score is a semi-quantitative measure ofthree histopathological characteristics seenunder a microscope by a pathologist.
Estrogen Receptor Immunohistochemistry (ER IHC)
*Used in standard TNM classification of breast cancer
ing and predicting on the same, entire data set.
We initially build machine learning models that predictthe patient’s status (dead or alive) based on all other fea-
tures. We measure performance using 3-fold cross val-idation accuracy in addition to a data set accuracy for
We used patient status as the target variable and all other
training and predicting on the same, entire data set.
features as the input features. We used the R Caret pack-age, which provides a library for a number of machinelearning models, to write and run different algorithms.
Next, we predict survival time of the patient. However,
we do not have survival time for all patients; the data ishighly skewed and right-censored. Patients may drop out
First, we used the K-Nearest Neighbor algorithm to clas-
of the study at any point or still be alive by the end of the
sify our data based on the closest feature training sam-
ples. We use a k-value of 2, to see if there were any un-
With a data set of only 931, it is extremely important
derlying relationships among features for patients based
to still use all of the training data. Two patients’ survival
on status. However, our 3-fold CV accuracy was low
times can be ranked not only if both have uncensored
survival times but also if the uncensored time or one is
We then tried 5 supervised learning models. None of
smaller than the censored survival time of the other. One
them performed better than 0.556 for 3-fold CV, though
of the most commonly used performance measures for
running and predicting on the entire data set gave values
survival models is the concordance index (CI) . CI
ranging from 0.693 to 0.716. The models were overfit-
is the fraction of all pairs of subjects whose predicted
ting the data and were not representing the relationships
survival times are ordered correctly across all patients. A
CI of 1 indicates perfection prediction accuracy, while a
In particular, the Gradient Boosting Model (GBM), an
CI of 0.5 is as good as a random predictor.
ensemble learning method which uses multiple weak pre-
Hence, we measure performance using 3-fold cross
diction models to form a single model in a stage-wise
validation (3-fold CV) for CI in addition to CI for train-
fashion, resulted in the most overfitted model.
Out of the standard machine learning approaches, lin-
The Cox Proportional-Hazard  approach estimates
ear SVM performed slightly better than the rest, possibly
weight w by leaving the baseline hazard function unspe-
because it did not overfit the data as much as other mod-
It is interesting to note that Linear Discriminant Anal-
ysis (LDA) performed approximately the same as Gen-
eralized Linear Models (GLM), even though LDA is amore simple model than GLM. LDA finds a linear com-
where Ti is survival time of patient i.
bination of our clinical features which characterizes the
After this estimation, we trained using weighted linear
patient survival status. We also used GLM, a generaliza-
regression. In order to avoid overfitting, we use Akaike
tion of ordinary linear regression models that allow for
Information Criterion (AIC) on the features passed to the
response variables that do not follow a normal distribu-
Cox model. The AIC is a measure of the relative good-
tion, because our response variables do not necessarily
ness of fit of a statistical model, often described as a
follow a normal distribution, but instead could follow a
tradeoff between bias and variance or between model ac-
distribution more similar to a log-odds model due to our
curacy and complexity. We first find the corresponding
prediction of status as a Bernoulli variable.
AIC values, and selected the model that minimizes infor-mation loss.
We obtained a 3-fold CV CI of 0.702, comparable to
the CI of 0.812 for training and predicting over the entire
We then predicted survival time using all features as input
and the CI as the measurement of model performance.
The outputted survival models compute the time it takes
for death to occur according to the features.
The Random Survival Forest (RSF) algorithm  is anensemble tree method for the analysis of right censored
survival data. More specifically, the algorithm performs
Proportional hazard (PH) models are the standard for
1. Draw B bootstrap samples from the original data,
studying the effects of features on survival time distribu-
where each bootstrap sample excludes on average
tions. A hazard function λ(t) measures the instantaneous
37% of the data, called out-of-bag data (OOB data).
The PH model assumes there is a multiplicative effect
2. Grow a survival tree for each bootstrap sample. At
each node, randomly select p variables. Then, splitthe node with the candidate variable which maxi-
mizes survival difference between daughter nodes.
where λ(t|x) is the hazard function with features x,
λ0(t) is the baseline hazard function when x = 0, w isthe vector of unknown parameters, and ewT x is the rela-
4. Calculate a hazard function (HF) for each tree, and
Based on the size of our data, we ran a RSF algorithm
The best 3-fold CV CI was achieved by taking all
with the number of trees to grow to 1000. We use the
features except for EHR IHC status and ER expression.
logrank splitting rule, which splits tree nodes by maxi-
EHR IHC status appears to lower the CI and ER expres-
mization of the log-rank test statistic.
We obtained a 3-fold CI of 0.813, which is also com-
parable to the CI of 0.812 for training and predicting overthe entire data set.
The following figure shows the ensemble survival func-
tion for each patient. The thick red line is overall en-semble survival, and the thick green line is Nelson-Aalen
We chose the RSF model, the best performing model, to
estimator. The Nelson-Aelen, often used to give an idea
gain insights into relationships among features.
of the survival rate shape, is given by the equation:
We determined which features contributed most to thelearning using backward search feature selection.
where di is the number of deaths at ti and ni is the total
Note that the overall ensemble survival begins to devi-
ate from the Nelson-Aelen estimator at later times.
The second figure below shows the same relationship,
where it is shown that RSF tends to predict higher sur-vival probabilities when survival proportions in the dataset are low.
Breast cancer prognosis presents an important challenge
 R. Henderson, M. Jones, J. Stare, ”Accuracy of
Point Predictions in Survival Analysis,” Statistics in
have described our use of various machine learning ap-
proaches to the complex problem of predicting breast
 L. J. vant Veer, H. Dai, M. J. van de Vijver, Y.
cancer survivability rate, with the data provided through
D. He, A. A. M. Hart, M. Mao, H. L. Peterse, K.
the DREAM Breast Cancer Prognosis Challenge.
van der Kooy, M. J. Marton, A. T. Witteveen, G. J.
Our results indicate that it is difficult to create accurate
Schreiber, R. M. Kerkhoven, C. Roberts, P. S. Lins-
standard machine learning models for predicting patient
ley, R. Bernards, and S. H. Friend, Gene expression
survival status. Survival data has many unique proper-
profiling predicts clinical outcome of breast cancer,
ties. The standard machine learning models did not have
Nature, vol. 415, no. 6871, pp. 530536, Jan. 2002.
any notion of a hazard function for determining patientsurvival status. Instead, it found unreal relationships that
 S. Paik, S. Shak, G. Tang, C. Kim, J. Baker, M.
solely existed in the unique data set, which was seen
Cronin, F. L. Baehner, M. G. Walker, D. Watson, T.
from the large difference in accuracy between 3-fold CV
Park, W. Hiller, E. R. Fisher, D. L. Wickerham, J.
and accuracy from training and predicting on the data set
Bryant, and N. Wolmark, A multigene assay to pre-
dict recurrence of tamoxifen-treated, node-negative
On the other hand, the two models that predicted haz-
breast cancer, N. Engl. J. Med., vol. 351, no. 27, pp.
ard functions seemed to do quite well, though it is diffi-
cult to compare due to the different model performance
 C. Curtis, S. P. Shah, S.-F. Chin, G. Turashvili, O.
measurements. It appears that both the Cox and RSF
M. Rueda, M. J. Dunning, D. Speed, A. G. Lynch,
models capture the relationship among features and sur-
S. Samarajiwa, Y. Yuan, S. GrŁf, G. Ha, G. Haffari,
vival outcome, as seen in almost identical values between
A. Bashashati, R. Russell, S. McKinney, M. Group,
the 3-fold CV CI and CI from training and predicting on
A. Langerd, A. Green, E. Provenzano, G. Wishart, S.
Pinder, P. Watson, F. Markowetz, L. Murphy, I. Ellis,
From feature analysis, we learned that at least for the
A. Purushotham, A.-L. Brresen-Dale, J. D. Brenton,
RSF model, age at diagnosis was the best feature predic-
S. Tavar, C. Caldas, and S. Aparicio, The genomic
tor. In addition, eliminating two features (estrogen recep-
and transcriptomic architecture of 2,000 breast tu-
tor copy number and estrogen receptor gene expression)
mours reveals novel subgroups, Nature, 2012.
in the model lead to a slightly higher 3-fold cross valida-tion score than with all features.
 V. C. Raykar, H. Steck, and B. Krishnapuram, ”On
From RSF ensemble analysis, we saw that RSF
Ranking in Survival Analysis: Bounds on the Con-
seemed to perform better at predicting either patients
with less time since diagnosis or when there is higher
 J. Fox, ”Cox Proportional-Hazards Regression for
probability of survival, or both. Therefore, RSF com-
Survival Data”, Appendix to ”An R and S-PLUS
bined with another algorithm that performs well in these
Companion to Applied Regression”, 2002.
conditions may produce even better results.
This work has limitations and could be improved in
 H. Ishwaran, U. B. Kogalur, E. H. Blackstone, and
three major ways. First, we should examine all genes
M. S. Lauer, ”Random Survival Forests,” The Annals
available in the data set and, using feature selection, find
the most predictive genes. Second, we should modify ourregular machine learning models to predict the Cox haz-ard function to give each model the right-censored datarelationship that exists. It is not necessarily that RSF isthe best predictor of survival out of the algorithms wehave used. Third, we should run our algorithms on moredata. To do so, we should modify our algorithms to im-pute or skip missing features without discarding the en-tire training example and use publicly available data sets.
Case Study #1 Name: ________________________________________________ Date: _________________________________________________ Ethical Decision Making Model Seminar in Medical Ethics PL 4700 John F. Morris, Ph.D. Rockhurst University Does your proposed course of action lead to CONSENSUS? If YES – then proceed … Possible Solutions Level III Level
Dosing Guide The following are guidelines, based on weight, for several formulations of over-the-counter remedies you may use for your child(ren). As with any illness, consulting your physician, when in doubt, is ACETAMINOPHEN (Tylenol, Panadol, etc) * Do not give < 3 months, contact doctor first Children's Infants' Children's Soft Chews Strength Concentrated Susp