pg <- penguins %>%
# not interested in classifying by time/island
dplyr::select(-island, -year, -sex) %>%
drop_na()
fit_tree <- tree(species ~ .,
split = "gini",
control = tree.control(nrow(pg), mincut = 40),
data = pg)
plot(fit_tree, type = "uniform")
text(fit_tree, pretty = 0, cex = 1.1)
fit_rf <- randomForest(species ~ ., ntree = 100, mtry = 2,
minsize = 20, data = pg)
fit_rf
##
## Call:
## randomForest(formula = species ~ ., data = pg, ntree = 100, mtry = 2, minsize = 20)
## Type of random forest: classification
## Number of trees: 100
## No. of variables tried at each split: 2
##
## OOB estimate of error rate: 2.63%
## Confusion matrix:
## Adelie Chinstrap Gentoo class.error
## Adelie 147 3 1 0.026490066
## Chinstrap 4 64 0 0.058823529
## Gentoo 0 1 122 0.008130081
Read ?importance
to learn more
varImpPlot(fit_rf)
pred_rf <- Predictor$new(fit_rf)
pdp_rf <- FeatureEffects$new(pred_rf,
features = c("bill_length_mm",
"flipper_length_mm"#,
#"bill_depth_mm",
#"body_mass_g"
), method = "pdp+ice")
plot(pdp_rf)
## Warning: UNRELIABLE VALUE: Future ('future_lapply-1') unexpectedly generated
## random numbers without specifying argument 'future.seed'. There is a risk that
## those random numbers are not statistically sound and the overall results might
## be invalid. To fix this, specify 'future.seed=TRUE'. This ensures that proper,
## parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To
## disable this check, use 'future.seed=NULL', or set option 'future.rng.onMisuse'
## to "ignore".
int_rf <- Interaction$new(pred_rf)
plot(int_rf)
Hint: read ?Interaction
If the measure is close to 0 then the predictions depend mostly on the variable alone, if it’s close to 1 then the predictions depend more on interactions between that variable and other predictors
Note that the iml
and gam
packages are slightly less compatible, e.g. it seems that iml
plots will automatically show results for only 1 class
fit_gam <- gam(species ~ ., family = "binomial", data = pg)
fit_gam
## Call:
## gam(formula = species ~ ., family = "binomial", data = pg)
##
## Degrees of Freedom: 341 total; 337 Residual
## Residual Deviance: 9.44917
pred_gam <- Predictor$new(fit_gam)#, type = "response")
pdp_gam <- FeatureEffects$new(pred_gam,
features = c("bill_length_mm",
"flipper_length_mm"#,
#"bill_depth_mm",
#"body_mass_g"
), method = "pdp+ice")
plot(pdp_gam)
## Warning: UNRELIABLE VALUE: Future ('future_lapply-1') unexpectedly generated
## random numbers without specifying argument 'future.seed'. There is a risk that
## those random numbers are not statistically sound and the overall results might
## be invalid. To fix this, specify 'future.seed=TRUE'. This ensures that proper,
## parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To
## disable this check, use 'future.seed=NULL', or set option 'future.rng.onMisuse'
## to "ignore".
int_gam <- Interaction$new(pred_gam)
plot(int_gam) + xlim(c(0,1))
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
Yes, the additivity assumption of GAMs means the model’s predictions cannot possibly depend on interactions
Additivity on the log-odds scale does not transform into additivity on the response scale, so in terms of the predicted probabilities the model does appear to depend on interactions between predictors and not just additively