A data set to predict a binary outcome using imaging and patient data.
Source
Kuhn, Max, and Kjell Johnson. Feature Engineering and Selection: A Practical Approach for Predictive Models. Chapman and Hall/CRC, 2019.
Details
These data were gathered to predict patient risk for ischemic stroke. A historical set of patients with a range of carotid artery blockages were selected. The data consisted of 126 patients, 44 of which had blockages greater than 70%. All patients had undergone Computed Tomography Angiography (CTA) to generate a detailed three-dimensional visualization and characterization of the blockage. These images were then analyzed in order to compute several features related to the disease, including: percent stenosis, arterial wall thickness, and tissue characteristics such as lipid-rich necrotic core and calcification.
The group of patients in this study also had follow-up information on whether or not a stroke occurred at a subsequent point in time. The data for each patient also included commonly collected clinical characteristics for risk of stroke such as whether or not the patient had atrial fibrillation, coronary artery disease, and a history of smoking. Demographics of gender and age were included as well. These readily available risk factors can be thought of as another potentially useful predictor set that can be evaluated. In fact, this set of predictors should be evaluated first to assess their ability to predict stroke since these predictors are easy to collect, are acquired at patient presentation, and do not require an expensive imaging technique.
Columns:
stroke
: factor (levels: 'yes' and 'no')nascet_scale
: numericcalc_vol
: numericcalc_vol_prop
: numericmatx_vol
: numericmatx_vol_prop
: numericlrnc_vol
: numericlrnc_vol_prop
: numericmax_calc_area
: numericmax_calc_area_prop
: numericmax_dilation_by_area
: numericmax_matx_area
: numericmax_matx_area_prop
: numericmax_lrnc_area
: numericmax_lrnc_area_prop
: numericmax_max_wall_thickness
: numericmax_remodeling_ratio
: numericmax_stenosis_by_area
: numericmax_wall_area
: numericwall_vol
: numericmax_stenosis_by_diameter
: numericage
: integermale
: integersmoking_history
: integeratrial_fibrillation
: integercoronary_artery_disease
: integerdiabetes_history
: integerhypercholesterolemia_history
: integerhypertension_history
: integer
Examples
data(ischemic_stroke)
str(ischemic_stroke)
#> tibble [126 × 29] (S3: tbl_df/tbl/data.frame)
#> $ stroke : Factor w/ 2 levels "yes","no": 2 2 2 1 2 2 2 2 2 2 ...
#> $ nascet_scale : num [1:126] 0 0 0 0 0 0 0 0 0 0 ...
#> $ calc_vol : num [1:126] 235.3 31.4 113.4 780.8 84.1 ...
#> $ calc_vol_prop : num [1:126] 0.0704 0.0162 0.0381 0.2134 0.0414 ...
#> $ matx_vol : num [1:126] 3157 3033 3835 3519 2990 ...
#> $ matx_vol_prop : num [1:126] 0.76 0.813 0.783 0.761 0.75 ...
#> $ lrnc_vol : num [1:126] 225 369 321 141 293 ...
#> $ lrnc_vol_prop : num [1:126] 0.0911 0.134 0.083 0.0321 0.0754 ...
#> $ max_calc_area : num [1:126] 12.35 7.13 16.29 63.35 17.58 ...
#> $ max_calc_area_prop : num [1:126] 0.366 0.211 0.409 0.576 0.322 ...
#> $ max_dilation_by_area : num [1:126] 521 91.7 271 2270.5 95.2 ...
#> $ max_matx_area : num [1:126] 71.2 27.2 38.1 341.1 56.6 ...
#> $ max_matx_area_prop : num [1:126] 0.952 0.946 0.946 0.969 0.921 ...
#> $ max_lrnc_area : num [1:126] 21.69 6.43 5.71 6.05 7.21 ...
#> $ max_lrnc_area_prop : num [1:126] 0.43 0.282 0.155 0.187 0.217 ...
#> $ max_max_wall_thickness : num [1:126] 2.41 2.54 3.71 6.12 3.98 ...
#> $ max_remodeling_ratio : num [1:126] 5.7 1.74 2.83 15.65 1.91 ...
#> $ max_stenosis_by_area : num [1:126] 19 30.2 33.9 34.3 36.6 ...
#> $ max_wall_area : num [1:126] 106.2 33.4 55.3 426.5 59.8 ...
#> $ wall_vol : num [1:126] 4192 3917 4935 4910 4045 ...
#> $ max_stenosis_by_diameter : num [1:126] 10.5 18.6 19.7 20.3 49.3 ...
#> $ age : int [1:126] 72 76 72 61 65 64 82 83 85 56 ...
#> $ male : int [1:126] 1 1 0 1 1 1 0 0 0 0 ...
#> $ smoking_history : int [1:126] 1 1 0 1 0 1 1 0 1 1 ...
#> $ atrial_fibrillation : int [1:126] 0 0 0 0 0 0 0 1 0 0 ...
#> $ coronary_artery_disease : int [1:126] 0 0 0 0 0 1 1 1 1 0 ...
#> $ diabetes_history : int [1:126] 0 1 0 1 0 0 0 0 0 0 ...
#> $ hypercholesterolemia_history: int [1:126] 0 1 0 1 0 1 0 0 1 0 ...
#> $ hypertension_history : int [1:126] 1 1 0 1 1 1 1 1 1 0 ...