Skip to content

Hill, LaPan, Li and Haney (2007) develop models to predict which cells in a high content screen were well segmented. The data consists of 119 imaging measurements on 2019. The original analysis used 1009 for training and 1010 as a test set (see the column called case).

Source

Hill, LaPan, Li and Haney (2007). Impact of image segmentation on high-content screening data quality for SK-BR-3 cells, BMC Bioinformatics, Vol. 8, pg. 340, https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-8-340.

Value

cells

a tibble

Details

The outcome class is contained in a factor variable called class with levels "PS" for poorly segmented and "WS" for well segmented.

The raw data used in the paper can be found at the Biomedcentral website. The version contained in cells is modified. First, several discrete versions of some of the predictors (with the suffix "Status") were removed. Second, there are several skewed predictors with minimum values of zero (that would benefit from some transformation, such as the log). A constant value of 1 was added to these fields: avg_inten_ch_2, fiber_align_2_ch_3, fiber_align_2_ch_4, spot_fiber_count_ch_4 and total_inten_ch_2.

Examples

data(cells)
str(cells)
#> tibble [2,019 × 58] (S3: tbl_df/tbl/data.frame)
#>  $ case                        : Factor w/ 2 levels "Test","Train": 1 2 2 2 1 1 1 1 1 1 ...
#>  $ class                       : Factor w/ 2 levels "PS","WS": 1 1 2 1 1 2 2 1 2 2 ...
#>  $ angle_ch_1                  : num [1:2019] 143.25 133.75 106.65 69.15 2.89 ...
#>  $ area_ch_1                   : int [1:2019] 185 819 431 298 285 172 177 251 495 384 ...
#>  $ avg_inten_ch_1              : num [1:2019] 15.7 31.9 28 19.5 24.3 ...
#>  $ avg_inten_ch_2              : num [1:2019] 4.95 206.88 116.32 102.29 112.42 ...
#>  $ avg_inten_ch_3              : num [1:2019] 9.55 69.92 63.94 28.22 20.47 ...
#>  $ avg_inten_ch_4              : num [1:2019] 2.21 164.15 106.7 31.03 40.58 ...
#>  $ convex_hull_area_ratio_ch_1 : num [1:2019] 1.12 1.26 1.05 1.2 1.11 ...
#>  $ convex_hull_perim_ratio_ch_1: num [1:2019] 0.92 0.797 0.935 0.866 0.957 ...
#>  $ diff_inten_density_ch_1     : num [1:2019] 29.5 31.9 32.5 26.7 31.6 ...
#>  $ diff_inten_density_ch_3     : num [1:2019] 13.8 43.1 36 22.9 21.7 ...
#>  $ diff_inten_density_ch_4     : num [1:2019] 6.83 79.31 51.36 26.39 25.03 ...
#>  $ entropy_inten_ch_1          : num [1:2019] 4.97 6.09 5.88 5.42 5.66 ...
#>  $ entropy_inten_ch_3          : num [1:2019] 4.37 6.64 6.68 5.44 5.29 ...
#>  $ entropy_inten_ch_4          : num [1:2019] 2.72 7.88 7.14 5.78 5.24 ...
#>  $ eq_circ_diam_ch_1           : num [1:2019] 15.4 32.3 23.4 19.5 19.1 ...
#>  $ eq_ellipse_lwr_ch_1         : num [1:2019] 3.06 1.56 1.38 3.39 2.74 ...
#>  $ eq_ellipse_oblate_vol_ch_1  : num [1:2019] 337 2233 802 725 608 ...
#>  $ eq_ellipse_prolate_vol_ch_1 : num [1:2019] 110 1433 583 214 222 ...
#>  $ eq_sphere_area_ch_1         : num [1:2019] 742 3279 1727 1195 1140 ...
#>  $ eq_sphere_vol_ch_1          : num [1:2019] 1901 17654 6751 3884 3621 ...
#>  $ fiber_align_2_ch_3          : num [1:2019] 1 1.49 1.3 1.22 1.49 ...
#>  $ fiber_align_2_ch_4          : num [1:2019] 1 1.35 1.52 1.73 1.38 ...
#>  $ fiber_length_ch_1           : num [1:2019] 27 64.3 21.1 43.1 34.7 ...
#>  $ fiber_width_ch_1            : num [1:2019] 7.41 13.17 21.14 7.4 8.48 ...
#>  $ inten_cooc_asm_ch_3         : num [1:2019] 0.01118 0.02805 0.00686 0.03096 0.02277 ...
#>  $ inten_cooc_asm_ch_4         : num [1:2019] 0.05045 0.01259 0.00614 0.01103 0.07969 ...
#>  $ inten_cooc_contrast_ch_3    : num [1:2019] 40.75 8.23 14.45 7.3 15.85 ...
#>  $ inten_cooc_contrast_ch_4    : num [1:2019] 13.9 6.98 16.7 13.39 3.54 ...
#>  $ inten_cooc_entropy_ch_3     : num [1:2019] 7.2 6.82 7.58 6.31 6.78 ...
#>  $ inten_cooc_entropy_ch_4     : num [1:2019] 5.25 7.1 7.67 7.2 5.5 ...
#>  $ inten_cooc_max_ch_3         : num [1:2019] 0.0774 0.1532 0.0284 0.1628 0.1274 ...
#>  $ inten_cooc_max_ch_4         : num [1:2019] 0.172 0.0739 0.0232 0.0775 0.2785 ...
#>  $ kurt_inten_ch_1             : num [1:2019] -0.6567 -0.2488 -0.2935 0.6259 0.0421 ...
#>  $ kurt_inten_ch_3             : num [1:2019] -0.608 -0.331 1.051 0.128 0.952 ...
#>  $ kurt_inten_ch_4             : num [1:2019] 0.726 -0.265 0.151 -0.347 -0.195 ...
#>  $ length_ch_1                 : num [1:2019] 26.2 47.2 28.1 37.9 36 ...
#>  $ neighbor_avg_dist_ch_1      : num [1:2019] 370 174 158 206 205 ...
#>  $ neighbor_min_dist_ch_1      : num [1:2019] 99.1 30.1 34.9 33.1 27 ...
#>  $ neighbor_var_dist_ch_1      : num [1:2019] 128 81.4 90.4 116.9 111 ...
#>  $ perim_ch_1                  : num [1:2019] 68.8 154.9 84.6 101.1 86.5 ...
#>  $ shape_bfr_ch_1              : num [1:2019] 0.665 0.54 0.724 0.589 0.6 ...
#>  $ shape_lwr_ch_1              : num [1:2019] 2.46 1.47 1.33 2.83 2.73 ...
#>  $ shape_p_2_a_ch_1            : num [1:2019] 1.88 2.26 1.27 2.55 2.02 ...
#>  $ skew_inten_ch_1             : num [1:2019] 0.455 0.399 0.472 0.882 0.517 ...
#>  $ skew_inten_ch_3             : num [1:2019] 0.46 0.62 0.971 1 1.177 ...
#>  $ skew_inten_ch_4             : num [1:2019] 1.233 0.527 0.325 0.604 0.926 ...
#>  $ spot_fiber_count_ch_3       : int [1:2019] 1 4 2 4 1 1 0 2 1 1 ...
#>  $ spot_fiber_count_ch_4       : num [1:2019] 5 12 7 8 8 5 5 8 12 8 ...
#>  $ total_inten_ch_1            : int [1:2019] 2781 24964 11552 5545 6603 53779 43950 4401 7593 6512 ...
#>  $ total_inten_ch_2            : num [1:2019] 701 160998 47511 28870 30306 ...
#>  $ total_inten_ch_3            : int [1:2019] 1690 54675 26344 8042 5569 21234 20929 4136 6488 7503 ...
#>  $ total_inten_ch_4            : int [1:2019] 392 128368 43959 8843 11037 57231 46187 373 24325 23162 ...
#>  $ var_inten_ch_1              : num [1:2019] 12.5 18.8 17.3 13.8 15.4 ...
#>  $ var_inten_ch_3              : num [1:2019] 7.61 56.72 37.67 30.01 20.5 ...
#>  $ var_inten_ch_4              : num [1:2019] 2.71 118.39 49.47 24.75 45.45 ...
#>  $ width_ch_1                  : num [1:2019] 10.6 32.2 21.2 13.4 13.2 ...