Skip to content

A data set that models yield as a function of biological material predictors and chemical structure predictors.

Source

Kuhn, Max, and Kjell Johnson. Applied predictive modeling. New York: Springer, 2013.

Value

chem_proc_yield

a tibble

Details

This data set contains information about a chemical manufacturing process, in which the goal is to understand the relationship between the process and the resulting final product yield. Raw material in this process is put through a sequence of 27 steps to generate the final pharmaceutical product. The starting material is generated from a biological unit and has a range of quality and characteristics. The objective in this project was to develop a model to predict percent yield of the manufacturing process. The data set consisted of 177 samples of biological material for which 57 characteristics were measured. Of the 57 characteristics, there were 12 measurements of the biological starting material, and 45 measurements of the manufacturing process. The process variables included measurements such as temperature, drying time, washing time, and concentrations of by-products at various steps. Some of the process measurements can be controlled, while others are observed. Predictors are continuous, count, categorical; some are correlated, and some contain missing values. Samples are not independent because sets of samples come from the same batch of biological starting material.

Columns:

  • yield: numeric

  • bio_material_01 - bio_material_12: numeric

  • man_proc_01 - man_proc_45: numeric

Examples

data(chem_proc_yield)
str(chem_proc_yield)
#> tibble [176 × 58] (S3: tbl_df/tbl/data.frame)
#>  $ yield          : num [1:176] 38 42.4 42 41.4 42.5 ...
#>  $ bio_material_01: num [1:176] 6.25 8.01 8.01 8.01 7.47 6.12 7.48 6.94 6.94 6.94 ...
#>  $ bio_material_02: num [1:176] 49.6 61 61 61 63.3 ...
#>  $ bio_material_03: num [1:176] 57 67.5 67.5 67.5 72.2 ...
#>  $ bio_material_04: num [1:176] 12.7 14.7 14.7 14.7 14 ...
#>  $ bio_material_05: num [1:176] 19.5 19.4 19.4 19.4 17.9 ...
#>  $ bio_material_06: num [1:176] 43.7 53.1 53.1 53.1 54.7 ...
#>  $ bio_material_07: num [1:176] 100 100 100 100 100 100 100 100 100 100 ...
#>  $ bio_material_08: num [1:176] 16.7 19 19 19 18.2 ...
#>  $ bio_material_09: num [1:176] 11.4 12.6 12.6 12.6 12.8 ...
#>  $ bio_material_10: num [1:176] 3.46 3.46 3.46 3.46 3.05 3.78 3.04 3.85 3.85 3.85 ...
#>  $ bio_material_11: num [1:176] 138 154 154 154 148 ...
#>  $ bio_material_12: num [1:176] 18.8 21.1 21.1 21.1 21.1 ...
#>  $ man_proc_01    : num [1:176] NA 0 0 0 10.7 12 11.5 12 12 12 ...
#>  $ man_proc_02    : num [1:176] NA 0 0 0 0 0 0 0 0 0 ...
#>  $ man_proc_03    : num [1:176] NA NA NA NA NA NA 1.56 1.55 1.56 1.55 ...
#>  $ man_proc_04    : num [1:176] NA 917 912 911 918 924 933 929 928 938 ...
#>  $ man_proc_05    : num [1:176] NA 1032 1004 1015 1028 ...
#>  $ man_proc_06    : num [1:176] NA 210 207 213 206 ...
#>  $ man_proc_07    : num [1:176] NA 177 178 177 178 178 177 178 177 177 ...
#>  $ man_proc_08    : num [1:176] NA 178 178 177 178 178 178 178 177 177 ...
#>  $ man_proc_09    : num [1:176] 43 46.6 45.1 44.9 45 ...
#>  $ man_proc_10    : num [1:176] NA NA NA NA NA NA 11.6 10.2 9.7 10.1 ...
#>  $ man_proc_11    : num [1:176] NA NA NA NA NA NA 11.5 11.3 11.1 10.2 ...
#>  $ man_proc_12    : num [1:176] NA 0 0 0 0 0 0 0 0 0 ...
#>  $ man_proc_13    : num [1:176] 35.5 34 34.8 34.8 34.6 34 32.4 33.6 33.9 34.3 ...
#>  $ man_proc_14    : num [1:176] 4898 4869 4878 4897 4992 ...
#>  $ man_proc_15    : num [1:176] 6108 6095 6087 6102 6233 ...
#>  $ man_proc_16    : num [1:176] 4682 4617 4617 4635 4733 ...
#>  $ man_proc_17    : num [1:176] 35.5 34 34.8 34.8 33.9 33.4 33.8 33.6 33.9 35.3 ...
#>  $ man_proc_18    : num [1:176] 4865 4867 4877 4872 4886 ...
#>  $ man_proc_19    : num [1:176] 6049 6097 6078 6073 6102 ...
#>  $ man_proc_20    : num [1:176] 4665 4621 4621 4611 4659 ...
#>  $ man_proc_21    : num [1:176] 0 0 0 0 -0.7 -0.6 1.4 0 0 1 ...
#>  $ man_proc_22    : num [1:176] NA 3 4 5 8 9 1 2 3 4 ...
#>  $ man_proc_23    : num [1:176] NA 0 1 2 4 1 1 2 3 1 ...
#>  $ man_proc_24    : num [1:176] NA 3 4 5 18 1 1 2 3 4 ...
#>  $ man_proc_25    : num [1:176] 4873 4869 4897 4892 4930 ...
#>  $ man_proc_26    : num [1:176] 6074 6107 6116 6111 6151 ...
#>  $ man_proc_27    : num [1:176] 4685 4630 4637 4630 4684 ...
#>  $ man_proc_28    : num [1:176] 10.7 11.2 11.1 11.1 11.3 11.4 11.2 11.1 11.3 11.4 ...
#>  $ man_proc_29    : num [1:176] 21 21.4 21.3 21.3 21.6 21.7 21.2 21.2 21.5 21.7 ...
#>  $ man_proc_30    : num [1:176] 9.9 9.9 9.4 9.4 9 10.1 11.2 10.9 10.5 9.8 ...
#>  $ man_proc_31    : num [1:176] 69.1 68.7 69.3 69.3 69.4 68.2 67.6 67.9 68 68.5 ...
#>  $ man_proc_32    : num [1:176] 156 169 173 171 171 173 159 161 160 164 ...
#>  $ man_proc_33    : num [1:176] 66 66 66 68 70 70 65 65 65 66 ...
#>  $ man_proc_34    : num [1:176] 2.4 2.6 2.6 2.5 2.5 2.5 2.5 2.5 2.5 2.5 ...
#>  $ man_proc_35    : num [1:176] 486 508 509 496 468 490 475 478 491 488 ...
#>  $ man_proc_36    : num [1:176] 0.019 0.019 0.018 0.018 0.017 0.018 0.019 0.019 0.019 0.019 ...
#>  $ man_proc_37    : num [1:176] 0.5 2 0.7 1.2 0.2 0.4 0.8 1 1.2 1.8 ...
#>  $ man_proc_38    : num [1:176] 3 2 2 2 2 2 2 2 3 3 ...
#>  $ man_proc_39    : num [1:176] 7.2 7.2 7.2 7.2 7.3 7.2 7.3 7.3 7.4 7.1 ...
#>  $ man_proc_40    : num [1:176] NA 0.1 0 0 0 0 0 0 0 0 ...
#>  $ man_proc_41    : num [1:176] NA 0.15 0 0 0 0 0 0 0 0 ...
#>  $ man_proc_42    : num [1:176] 11.6 11.1 12 10.6 11 11.5 11.7 11.4 11.4 11.3 ...
#>  $ man_proc_43    : num [1:176] 3 0.9 1 1.1 1.1 2.2 0.7 0.8 0.9 0.8 ...
#>  $ man_proc_44    : num [1:176] 1.8 1.9 1.8 1.8 1.7 1.8 2 2 1.9 1.9 ...
#>  $ man_proc_45    : num [1:176] 2.4 2.2 2.3 2.1 2.1 2 2.2 2.2 2.1 2.4 ...