Skip to content

Instantly share code, notes, and snippets.

@BenHeubl
Created February 13, 2020 16:36
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save BenHeubl/d6966b766a23323bbf1f112cfd682c0c to your computer and use it in GitHub Desktop.
Save BenHeubl/d6966b766a23323bbf1f112cfd682c0c to your computer and use it in GitHub Desktop.
tut17
income <- read.csv("https://raw.githubusercontent.com/selva86/datasets/master/income.csv")
incomeR <- income %>%
mutate(INCOME = if_else(INCOME == "-10.000)", "Under 30k",
if_else(INCOME == "[10.000–15.000)", "Under 30k",
if_else(INCOME == "[15.000–20.000)", "Under 30k",
if_else(INCOME == "[20.000–25.000)", "Under 30k",
if_else(INCOME == "[25.000–30.000)", "Under 30k", 'Over 30k')))))) %>% mutate_if(is.factor, fct_explicit_na, na_level = 'Unknown') %>%
mutate(INCOME = as.factor(INCOME))
#As for the first example we will quickly build a test and training (or validation) set:
set.seed(100)
incomeR <- incomeR[sample(nrow(incomeR)),]
train <- sample(nrow(incomeR), 0.7*nrow(incomeR), replace = FALSE)
TrainSet <- incomeR[train,]
TestSet <- incomeR[-train,]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment