Skip to content

A data set containing information on a subset of taxi trips in the city of Chicago in 2022.




The source data are originally described on the linked City of Chicago data portal. The data exported here are a pre-processed subset motivated by the modeling problem of predicting whether a rider will tip or not.


Whether the rider left a tip. A factor with levels "yes" and "no".


The trip distance, in odometer miles.


The taxi company, as a factor. Companies that occurred few times were binned as "other".


Whether the trip's starting and ending locations are in the same community. See the source data for community area values.


The day of the week in which the trip began, as a factor.


The month in which the trip began, as a factor.


The hour of the day in which the trip began, as a numeric.


# \donttest{
#> # A tibble: 10,000 × 7
#>    tip   distance company                      local dow   month  hour
#>    <fct>    <dbl> <fct>                        <fct> <fct> <fct> <int>
#>  1 yes      17.2  Chicago Independents         no    Thu   Feb      16
#>  2 yes       0.88 City Service                 yes   Thu   Mar       8
#>  3 yes      18.1  other                        no    Mon   Feb      18
#>  4 yes      20.7  Chicago Independents         no    Mon   Apr       8
#>  5 yes      12.2  Chicago Independents         no    Sun   Mar      21
#>  6 yes       0.94 Sun Taxi                     yes   Sat   Apr      23
#>  7 yes      17.5  Flash Cab                    no    Fri   Mar      12
#>  8 yes      17.7  other                        no    Sun   Jan       6
#>  9 yes       1.85 Taxicab Insurance Agency Llc no    Fri   Apr      12
#> 10 yes       1.47 City Service                 no    Tue   Mar      14
#> # ℹ 9,990 more rows
# }