My personal notes from DataCamp’s course

Back to home

Contents

Arguments validation
Returning multiple values from functions

Arguments validation

The assertive package comes with several functions to validate arguments within functions.

library(assertive)

calc_harmonic_mean <- function(x, na.rm = FALSE) {
  assert_is_numeric(x)
  if(any(is_non_positive(x), na.rm = TRUE)) {
    stop("x contains non-positive values, so the harmonic mean makes no sense.")
  }
  na.rm <- coerce_to(use_first(na.rm), target_class = "logical")
  
  print(1 / mean(1 / x, na.rm = na.rm))
}

calc_harmonic_mean(1:5, na.rm = 1:5)
## Warning: Only the first value of na.rm (= 1) will be used.
## Warning: Coercing use_first(na.rm) to class 'logical'.
## [1] 2.189781

Returning multiple values from functions

Functions can only return one value. If you want to return multiple things, then you can store them all in a list.

If users want to have the list items as separate variables, they can assign each list element to its own variable using zeallot’s multi-assignment operator, %<-%.

Create model object to use in the examples:

suppressPackageStartupMessages(library(dplyr))

snake_river_visits <- readRDS('data/snake_river_visits.rds')
model <- lm(n_visits ~ gender + income + travel, snake_river_visits)

Returning a list a spliting items via %<-% operator:

library(broom)
library(zeallot)

groom_model <- function(model) {
  list(
    model = glance(model),
    coefficients = tidy(model),
    observations = augment(model)
  )
}

# Call groom_model on model, assigning to 3 variables
c(mdl, cff, obs) %<-% groom_model(model)

# See these individual variables
mdl; cff; obs
## # A tibble: 1 x 11
##   r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
##       <dbl>         <dbl> <dbl>     <dbl>    <dbl> <int>  <dbl> <dbl> <dbl>
## 1     0.224         0.210  43.3      16.3 1.61e-16     7 -1791. 3599. 3630.
## # … with 2 more variables: deviance <dbl>, df.residual <int>
## # A tibble: 7 x 5
##   term              estimate std.error statistic  p.value
##   <chr>                <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)          61.5       7.83     7.86  5.06e-14
## 2 genderfemale         10.8       4.94     2.19  2.88e- 2
## 3 income($25k,$55k]    -1.95      7.74    -0.251 8.02e- 1
## 4 income($55k,$95k]   -19.3       8.01    -2.42  1.63e- 2
## 5 income($95k,$Inf)   -18.6       7.47    -2.49  1.32e- 2
## 6 travel(0.25h,4h]    -26.6       6.00    -4.44  1.24e- 5
## 7 travel(4h,Infh)     -45.1       6.30    -7.16  5.06e-12
## # A tibble: 346 x 12
##    .rownames n_visits gender income travel .fitted .se.fit .resid   .hat .sigma
##    <chr>        <dbl> <fct>  <fct>  <fct>    <dbl>   <dbl>  <dbl>  <dbl>  <dbl>
##  1 25               2 female ($95k… (4h,I…    8.67    5.79  -6.67 0.0179   43.4
##  2 26               1 female ($95k… (4h,I…    8.67    5.79  -7.67 0.0179   43.4
##  3 27               1 male   ($95k… (0.25…   16.3     5.35 -15.3  0.0153   43.4
##  4 29               1 male   ($95k… (4h,I…   -2.18    4.79   3.18 0.0122   43.4
##  5 30               1 female ($55k… (4h,I…    7.95    6.55  -6.95 0.0229   43.4
##  6 31               1 male   [$0,$… [0h,0…   61.5     7.83 -60.5  0.0326   43.3
##  7 33              80 female [$0,$… [0h,0…   72.4     7.39   7.61 0.0291   43.4
##  8 34             104 female ($95k… [0h,0…   53.8     6.35  50.2  0.0215   43.3
##  9 35              55 male   ($25k… (0.25…   33.0     5.57  22.0  0.0165   43.4
## 10 36             350 female ($25k… [0h,0…   70.4     6.35 280.   0.0215   40.6
## # … with 336 more rows, and 2 more variables: .cooksd <dbl>, .std.resid <dbl>

Sometimes you want the return multiple things from a function, but you want the result to have a particular class (for example, a data frame or a numeric vector), so returning a list isn’t appropriate. This is common when you have a result plus metadata about the result. (Metadata is “data about the data”. For example, it could be the file a dataset was loaded from, or the username of the person who created the variable, or the number of iterations for an algorithm to converge.)

pipeable_plot <- function(data, formula) {
  plot(formula, data)
  attr(data, 'formula') <- formula
  invisible(data)
}

plt_dist_vs_speed <- cars %>% 
  pipeable_plot(dist ~ speed)

str(plt_dist_vs_speed)
## 'data.frame':    50 obs. of  2 variables:
##  $ speed: num  4 4 7 7 8 9 10 10 10 11 ...
##  $ dist : num  2 10 4 22 16 10 18 26 34 17 ...
##  - attr(*, "formula")=Class 'formula'  language dist ~ speed
##   .. ..- attr(*, ".Environment")=<environment: 0x5590dd8f7c88>