Predicting permeability from chemical information
Source:R/permeability_qsar.R
permeability_qsar.Rd
A quantitative structure-activity relationship (QSAR) data set to predict when a molecule can permeate cells.
Details
This pharmaceutical data set was used to develop a model for predicting compounds' permeability. In short, permeability is the measure of a molecule's ability to cross a membrane. The body, for example, has notable membranes between the body and brain, known as the blood-brain barrier, and between the gut and body in the intestines. These membranes help the body guard critical regions from receiving undesirable or detrimental substances. For an orally taken drug to be effective in the brain, it first must pass through the intestinal wall and then must pass through the blood-brain barrier in order to be present for the desired neurological target. Therefore, a compound's ability to permeate relevant biological membranes is critically important to understand early in the drug discovery process. Compounds that appear to be effective for a particular disease in research screening experiments, but appear to be poorly permeable may need to be altered in order improve permeability, and thus the compound's ability to reach the desired target. Identifying permeability problems can help guide chemists towards better molecules.
Permeability assays such as PAMPA and Caco-2 have been developed to help measure compounds' permeability (Kansy et al, 1998). These screens are effective at quantifying a compound's permeability, but the assay is expensive labor intensive. Given a sufficient number of compounds that have been screened, we could develop a predictive model for permeability in an attempt to potentially reduce the need for the assay. In this project there were 165 unique compounds; 1107 molecular fingerprints were determined for each. A molecular fingerprint is a binary sequence of numbers that represents the presence or absence of a specific molecular sub-structure. The response is highly skewed, the predictors are sparse (15.5% are present), and many predictors are strongly associated.
Columns:
permeability
: numericchem_fp_0001
-chem_fp_1107
: numeric
Examples
data(permeability_qsar)
str(permeability_qsar)
#> tibble [165 × 1,108] (S3: tbl_df/tbl/data.frame)
#> $ permeability: Named num [1:165] 12.52 1.12 19.41 1.73 1.68 ...
#> ..- attr(*, "names")= chr [1:165] "1" "2" "3" "4" ...
#> $ chem_fp_0001: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0002: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0003: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0004: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0005: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0006: num [1:165] 1 0 1 0 0 0 1 0 1 0 ...
#> $ chem_fp_0007: num [1:165] 1 1 1 1 1 1 1 1 1 1 ...
#> $ chem_fp_0008: num [1:165] 1 1 1 1 1 1 1 1 1 1 ...
#> $ chem_fp_0009: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0010: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0011: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0012: num [1:165] 1 1 0 1 1 1 0 0 0 1 ...
#> $ chem_fp_0013: num [1:165] 0 1 1 1 1 1 1 1 1 1 ...
#> $ chem_fp_0014: num [1:165] 1 1 1 1 1 1 1 1 1 1 ...
#> $ chem_fp_0015: num [1:165] 0 0 0 0 0 0 0 0 1 0 ...
#> $ chem_fp_0016: num [1:165] 0 0 0 0 0 0 1 0 1 0 ...
#> $ chem_fp_0017: num [1:165] 1 1 1 1 1 1 1 1 1 1 ...
#> $ chem_fp_0018: num [1:165] 1 1 1 1 1 1 1 1 1 1 ...
#> $ chem_fp_0019: num [1:165] 1 1 1 1 1 1 1 1 1 1 ...
#> $ chem_fp_0020: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0021: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0022: num [1:165] 1 1 1 1 1 1 1 1 1 1 ...
#> $ chem_fp_0023: num [1:165] 1 1 1 1 1 1 1 1 1 1 ...
#> $ chem_fp_0024: num [1:165] 1 1 1 1 1 1 1 1 1 1 ...
#> $ chem_fp_0025: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0026: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0027: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0028: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0029: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0030: num [1:165] 1 1 1 1 1 1 1 1 1 1 ...
#> $ chem_fp_0031: num [1:165] 1 1 1 1 1 1 1 1 1 1 ...
#> $ chem_fp_0032: num [1:165] 1 1 1 1 1 1 1 1 1 1 ...
#> $ chem_fp_0033: num [1:165] 1 1 1 1 1 1 1 1 1 1 ...
#> $ chem_fp_0034: num [1:165] 1 1 1 1 1 1 1 1 1 1 ...
#> $ chem_fp_0035: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0036: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0037: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0038: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0039: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0040: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0041: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0042: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0043: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0044: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0045: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0046: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0047: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0048: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0049: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0050: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0051: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0052: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0053: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0054: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0055: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0056: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0057: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0058: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0059: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0060: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0061: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0062: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0063: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0064: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0065: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0066: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0067: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0068: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0069: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0070: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0071: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0072: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0073: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0074: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0075: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0076: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0077: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0078: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0079: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0080: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0081: num [1:165] 1 1 1 1 1 1 1 1 1 1 ...
#> $ chem_fp_0082: num [1:165] 1 1 1 1 1 1 1 1 1 1 ...
#> $ chem_fp_0083: num [1:165] 1 1 1 1 1 1 1 1 1 1 ...
#> $ chem_fp_0084: num [1:165] 1 1 1 1 1 1 1 1 1 1 ...
#> $ chem_fp_0085: num [1:165] 1 1 1 1 1 1 1 1 1 1 ...
#> $ chem_fp_0086: num [1:165] 1 1 1 1 1 1 0 0 0 1 ...
#> $ chem_fp_0087: num [1:165] 1 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0088: num [1:165] 1 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0089: num [1:165] 1 1 1 1 1 1 1 1 1 1 ...
#> $ chem_fp_0090: num [1:165] 1 1 1 1 1 1 1 1 1 1 ...
#> $ chem_fp_0091: num [1:165] 1 1 1 1 1 1 1 1 1 1 ...
#> $ chem_fp_0092: num [1:165] 1 1 1 1 1 1 1 1 1 1 ...
#> $ chem_fp_0093: num [1:165] 0 0 0 0 0 0 0 1 1 0 ...
#> $ chem_fp_0094: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0095: num [1:165] 0 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0096: num [1:165] 1 1 1 1 1 1 0 0 0 1 ...
#> $ chem_fp_0097: num [1:165] 1 0 0 0 0 0 0 0 0 0 ...
#> $ chem_fp_0098: num [1:165] 1 0 0 0 0 0 0 0 0 0 ...
#> [list output truncated]