An accurate breast cancer prognosis, or breast cancer sur- Our goal was to predict survival for each individual. We vivability prediction, is important as it often guides the had two approaches: predicting a discrete survival status treatment course of action, ability to claim additional fi- based on time since diagnosis and other features, and pre- nancial support from the government, actions of the pa- dicting a continuous survival time based on all features.
tient and family, and more [1]. Predicting breast cancersurvivability is commonly done using clinical features.
TNM staging, the globally accepted standard used to de-scribe cancer, was devised more than 60 years ago and Breast cancer sample data is made available through the only looks at three features: size of the tumor, number of DREAM challenge from the METABRIC data of 1,000 regional lymph nodes with cancer, and the spread of can- breast tumor samples used in a previous study [2], where cer to other parts of the body. With the advent of afford- data origin and preprocessing is explained in detail. We able genomic sequencing and acceleration of findings in further process the data by discarding samples with miss- molecular biology in the past decade, molecular features ing values, and are thus left with 931 samples.
may be practical to improve breast cancer prognosis.
Molecular diagnostics for cancer therapy decision- making have shown initial promising clinical results.
This has lead to a flood of published reports of signa- There are two indicators of survival: time from breast tures predictive of breast cancer phenotypes, and several cancer diagnosis to last follow-up and status of the pa- molecular diagnostic tests for cancer therapy decision- tient (alive or dead) at last follow-up time. Survival data making have gained regulatory approval in recent years is right-censored, since patients may be alive at the end [2, 3]. However, there is no consensus for the most accu- rate computational methods and models to predict breastcancer survivability. In addition, it is unclear that incor- porating molecular data as a complement or replacementfor traditional clinical diagnostic tools adds any value [4].
Gene expression is generated using molecular profiling Therefore, it is necessary to objectively assess whether platforms, described in full detail in another study [2].
genomic data currently provides value beyond traditional The genes used as training features are narrowed to a list of 9 suggested by the DREAM challenge and previous To aid in efforts to solve this problem, we predicted literature. We used two estrogen pathway genes (ER and breast cancer survivability with machine learning tech- PR), two human epidermal growth factor 2 receptor am- niques as part of the DREAM Breast Cancer Prognosis plicon genes (HER2 SNP6 and GII), and five immune Challenge. The ultimate goal of the challenge is to objec- response genes (CXCL10, STAT1, GBP1, GZMA, and tively compare many computational algorithms through providing a common training dataset in an effort to findthe best features for breast cancer prognosis. The dataset provided contains standard clinical measurements in ad-dition to genomic information, thus allowing genomic in- In addition, we have the following clinical annotations, formation to be compared with standard clinical features.
the classic features used for breast cancer prognosis: CT: chemotherapyNumber of lymph nodes found with cancer0: no nodes1: 1-3 nodes 2: 4-9 nodes3: over 9 nodes0: 0-20 mm1: 21-50 mm 2: over 50 mm3: Direct extension to chest wall or skin0: Nottingham score 3-51: Nottingham score 6-72: Nottingham score 8-9 The score is a semi-quantitative measure ofthree histopathological characteristics seenunder a microscope by a pathologist.
Estrogen Receptor Immunohistochemistry (ER IHC) *Used in standard TNM classification of breast cancer ing and predicting on the same, entire data set.
We initially build machine learning models that predictthe patient’s status (dead or alive) based on all other fea- tures. We measure performance using 3-fold cross val-idation accuracy in addition to a data set accuracy for We used patient status as the target variable and all other training and predicting on the same, entire data set.
features as the input features. We used the R Caret pack-age, which provides a library for a number of machinelearning models, to write and run different algorithms.
Next, we predict survival time of the patient. However, we do not have survival time for all patients; the data ishighly skewed and right-censored. Patients may drop out First, we used the K-Nearest Neighbor algorithm to clas- of the study at any point or still be alive by the end of the sify our data based on the closest feature training sam- ples. We use a k-value of 2, to see if there were any un- With a data set of only 931, it is extremely important derlying relationships among features for patients based to still use all of the training data. Two patients’ survival on status. However, our 3-fold CV accuracy was low times can be ranked not only if both have uncensored survival times but also if the uncensored time or one is We then tried 5 supervised learning models. None of smaller than the censored survival time of the other. One them performed better than 0.556 for 3-fold CV, though of the most commonly used performance measures for running and predicting on the entire data set gave values survival models is the concordance index (CI) [5]. CI ranging from 0.693 to 0.716. The models were overfit- is the fraction of all pairs of subjects whose predicted ting the data and were not representing the relationships survival times are ordered correctly across all patients. A CI of 1 indicates perfection prediction accuracy, while a In particular, the Gradient Boosting Model (GBM), an CI of 0.5 is as good as a random predictor.
ensemble learning method which uses multiple weak pre- Hence, we measure performance using 3-fold cross diction models to form a single model in a stage-wise validation (3-fold CV) for CI in addition to CI for train- fashion, resulted in the most overfitted model.
Out of the standard machine learning approaches, lin- The Cox Proportional-Hazard [6] approach estimates ear SVM performed slightly better than the rest, possibly weight w by leaving the baseline hazard function unspe- because it did not overfit the data as much as other mod- It is interesting to note that Linear Discriminant Anal- ysis (LDA) performed approximately the same as Gen- eralized Linear Models (GLM), even though LDA is amore simple model than GLM. LDA finds a linear com- where Ti is survival time of patient i.
bination of our clinical features which characterizes the After this estimation, we trained using weighted linear patient survival status. We also used GLM, a generaliza- regression. In order to avoid overfitting, we use Akaike tion of ordinary linear regression models that allow for Information Criterion (AIC) on the features passed to the response variables that do not follow a normal distribu- Cox model. The AIC is a measure of the relative good- tion, because our response variables do not necessarily ness of fit of a statistical model, often described as a follow a normal distribution, but instead could follow a tradeoff between bias and variance or between model ac- distribution more similar to a log-odds model due to our curacy and complexity. We first find the corresponding prediction of status as a Bernoulli variable.
AIC values, and selected the model that minimizes infor-mation loss.
We obtained a 3-fold CV CI of 0.702, comparable to the CI of 0.812 for training and predicting over the entire We then predicted survival time using all features as input and the CI as the measurement of model performance.
The outputted survival models compute the time it takes for death to occur according to the features.
The Random Survival Forest (RSF) algorithm [7] is anensemble tree method for the analysis of right censored survival data. More specifically, the algorithm performs Proportional hazard (PH) models are the standard for 1. Draw B bootstrap samples from the original data, studying the effects of features on survival time distribu- where each bootstrap sample excludes on average tions. A hazard function λ(t) measures the instantaneous 37% of the data, called out-of-bag data (OOB data).
The PH model assumes there is a multiplicative effect 2. Grow a survival tree for each bootstrap sample. At each node, randomly select p variables. Then, splitthe node with the candidate variable which maxi- mizes survival difference between daughter nodes.
where λ(t|x) is the hazard function with features x, λ0(t) is the baseline hazard function when x = 0, w isthe vector of unknown parameters, and ewT x is the rela- 4. Calculate a hazard function (HF) for each tree, and Based on the size of our data, we ran a RSF algorithm The best 3-fold CV CI was achieved by taking all with the number of trees to grow to 1000. We use the features except for EHR IHC status and ER expression.
logrank splitting rule, which splits tree nodes by maxi- EHR IHC status appears to lower the CI and ER expres- mization of the log-rank test statistic.
We obtained a 3-fold CI of 0.813, which is also com- parable to the CI of 0.812 for training and predicting overthe entire data set.
The following figure shows the ensemble survival func- tion for each patient. The thick red line is overall en-semble survival, and the thick green line is Nelson-Aalen We chose the RSF model, the best performing model, to estimator. The Nelson-Aelen, often used to give an idea gain insights into relationships among features.
of the survival rate shape, is given by the equation: We determined which features contributed most to thelearning using backward search feature selection.
where di is the number of deaths at ti and ni is the total Note that the overall ensemble survival begins to devi- ate from the Nelson-Aelen estimator at later times.
The second figure below shows the same relationship, where it is shown that RSF tends to predict higher sur-vival probabilities when survival proportions in the dataset are low.
Breast cancer prognosis presents an important challenge [1] R. Henderson, M. Jones, J. Stare, ”Accuracy of Point Predictions in Survival Analysis,” Statistics in have described our use of various machine learning ap- proaches to the complex problem of predicting breast [2] L. J. vant Veer, H. Dai, M. J. van de Vijver, Y.
cancer survivability rate, with the data provided through D. He, A. A. M. Hart, M. Mao, H. L. Peterse, K.
the DREAM Breast Cancer Prognosis Challenge.
van der Kooy, M. J. Marton, A. T. Witteveen, G. J.
Our results indicate that it is difficult to create accurate Schreiber, R. M. Kerkhoven, C. Roberts, P. S. Lins- standard machine learning models for predicting patient ley, R. Bernards, and S. H. Friend, Gene expression survival status. Survival data has many unique proper- profiling predicts clinical outcome of breast cancer, ties. The standard machine learning models did not have Nature, vol. 415, no. 6871, pp. 530536, Jan. 2002.
any notion of a hazard function for determining patientsurvival status. Instead, it found unreal relationships that [3] S. Paik, S. Shak, G. Tang, C. Kim, J. Baker, M.
solely existed in the unique data set, which was seen Cronin, F. L. Baehner, M. G. Walker, D. Watson, T.
from the large difference in accuracy between 3-fold CV Park, W. Hiller, E. R. Fisher, D. L. Wickerham, J.
and accuracy from training and predicting on the data set Bryant, and N. Wolmark, A multigene assay to pre- dict recurrence of tamoxifen-treated, node-negative On the other hand, the two models that predicted haz- breast cancer, N. Engl. J. Med., vol. 351, no. 27, pp.
ard functions seemed to do quite well, though it is diffi- cult to compare due to the different model performance [4] C. Curtis, S. P. Shah, S.-F. Chin, G. Turashvili, O.
measurements. It appears that both the Cox and RSF M. Rueda, M. J. Dunning, D. Speed, A. G. Lynch, models capture the relationship among features and sur- S. Samarajiwa, Y. Yuan, S. GrŁf, G. Ha, G. Haffari, vival outcome, as seen in almost identical values between A. Bashashati, R. Russell, S. McKinney, M. Group, the 3-fold CV CI and CI from training and predicting on A. Langerd, A. Green, E. Provenzano, G. Wishart, S.
Pinder, P. Watson, F. Markowetz, L. Murphy, I. Ellis, From feature analysis, we learned that at least for the A. Purushotham, A.-L. Brresen-Dale, J. D. Brenton, RSF model, age at diagnosis was the best feature predic- S. Tavar, C. Caldas, and S. Aparicio, The genomic tor. In addition, eliminating two features (estrogen recep- and transcriptomic architecture of 2,000 breast tu- tor copy number and estrogen receptor gene expression) mours reveals novel subgroups, Nature, 2012.
in the model lead to a slightly higher 3-fold cross valida-tion score than with all features.
[5] V. C. Raykar, H. Steck, and B. Krishnapuram, ”On From RSF ensemble analysis, we saw that RSF Ranking in Survival Analysis: Bounds on the Con- seemed to perform better at predicting either patients with less time since diagnosis or when there is higher [6] J. Fox, ”Cox Proportional-Hazards Regression for probability of survival, or both. Therefore, RSF com- Survival Data”, Appendix to ”An R and S-PLUS bined with another algorithm that performs well in these Companion to Applied Regression”, 2002.
conditions may produce even better results.
This work has limitations and could be improved in [7] H. Ishwaran, U. B. Kogalur, E. H. Blackstone, and three major ways. First, we should examine all genes M. S. Lauer, ”Random Survival Forests,” The Annals available in the data set and, using feature selection, find the most predictive genes. Second, we should modify ourregular machine learning models to predict the Cox haz-ard function to give each model the right-censored datarelationship that exists. It is not necessarily that RSF isthe best predictor of survival out of the algorithms wehave used. Third, we should run our algorithms on moredata. To do so, we should modify our algorithms to im-pute or skip missing features without discarding the en-tire training example and use publicly available data sets.


Microsoft powerpoint - case #1 worksheet - powerpoint version

Case Study #1 Name: ________________________________________________ Date: _________________________________________________ Ethical Decision Making Model Seminar in Medical Ethics PL 4700 John F. Morris, Ph.D. Rockhurst University Does your proposed course of action lead to CONSENSUS? If YES – then proceed … Possible Solutions Level III Level

Pp dosing 91905.xls

Dosing Guide The following are guidelines, based on weight, for several formulations of over-the-counter remedies you may use for your child(ren). As with any illness, consulting your physician, when in doubt, is ACETAMINOPHEN (Tylenol, Panadol, etc) * Do not give < 3 months, contact doctor first Children's Infants' Children's Soft Chews Strength Concentrated Susp

Copyright © 2010 Health Drug Pdf