Annual Stack Overflow Developer Survey Data
Source
Julia Silge, Supervised Machine Learning Case Studies in R
https://supervised-ml-course.netlify.com/chapter2
Raw data: https://insights.stackoverflow.com/survey/
Details
These data are a collection of 5,594 data points collected on developers. These data could be used to try to predict who works remotely (as used in the source listed below).
Examples
data(stackoverflow)
str(stackoverflow)
#> tibble [5,594 × 21] (S3: tbl_df/tbl/data.frame)
#> $ Country : Factor w/ 5 levels "Canada","Germany",..: 4 5 5 2 3 5 5 2 5 2 ...
#> $ Salary : num [1:5594] 100000 130000 175000 64516 6636 ...
#> $ YearsCodedJob : int [1:5594] 20 20 16 4 1 1 13 4 7 17 ...
#> $ OpenSource : num [1:5594] 0 1 0 0 0 0 0 1 1 1 ...
#> $ Hobby : num [1:5594] 1 1 1 0 1 1 1 0 1 1 ...
#> $ CompanySizeNumber : num [1:5594] 5000 1000 10000 1000 5000 20 20 5000 20 20 ...
#> $ Remote : Factor w/ 2 levels "Remote","Not remote": 1 1 2 2 2 2 2 2 2 2 ...
#> $ CareerSatisfaction : int [1:5594] 8 9 7 9 5 8 7 7 8 9 ...
#> $ Data_scientist : num [1:5594] 0 0 0 0 0 0 0 0 0 0 ...
#> $ Database_administrator : num [1:5594] 0 0 0 0 0 0 0 0 0 0 ...
#> $ Desktop_applications_developer : num [1:5594] 0 0 0 0 0 0 0 0 1 1 ...
#> $ Developer_with_stats_math_background: num [1:5594] 0 0 0 0 0 0 0 0 0 1 ...
#> $ DevOps : num [1:5594] 0 1 0 0 0 0 0 0 0 0 ...
#> $ Embedded_developer : num [1:5594] 1 1 0 0 0 0 0 0 0 0 ...
#> $ Graphic_designer : num [1:5594] 0 0 0 0 0 0 0 0 0 0 ...
#> $ Graphics_programming : num [1:5594] 0 0 0 0 0 0 0 0 0 0 ...
#> $ Machine_learning_specialist : num [1:5594] 0 0 0 0 0 0 0 0 0 0 ...
#> $ Mobile_developer : num [1:5594] 0 0 0 0 0 0 0 0 1 0 ...
#> $ Quality_assurance_engineer : num [1:5594] 0 1 0 0 0 0 0 0 0 0 ...
#> $ Systems_administrator : num [1:5594] 0 0 0 0 0 0 0 0 0 0 ...
#> $ Web_developer : num [1:5594] 0 1 1 1 1 1 1 1 0 0 ...