Introduction to writing functions in R

Arguments validation

The assertive package comes with several functions to validate arguments within functions.

library(assertive)

calc_harmonic_mean <- function(x, na.rm = FALSE) {
  assert_is_numeric(x)
  if(any(is_non_positive(x), na.rm = TRUE)) {
    stop("x contains non-positive values, so the harmonic mean makes no sense.")
  }
  na.rm <- coerce_to(use_first(na.rm), target_class = "logical")
  
  print(1 / mean(1 / x, na.rm = na.rm))
}

calc_harmonic_mean(1:5, na.rm = 1:5)

## Warning: Only the first value of na.rm (= 1) will be used.

## Warning: Coercing use_first(na.rm) to class 'logical'.

## [1] 2.189781

Returning multiple values from functions

Functions can only return one value. If you want to return multiple things, then you can store them all in a list.

If users want to have the list items as separate variables, they can assign each list element to its own variable using zeallot’s multi-assignment operator, %<-%.

Create model object to use in the examples:

suppressPackageStartupMessages(library(dplyr))

snake_river_visits <- readRDS('data/snake_river_visits.rds')
model <- lm(n_visits ~ gender + income + travel, snake_river_visits)

Returning a list a spliting items via %<-% operator:

library(broom)
library(zeallot)

groom_model <- function(model) {
  list(
    model = glance(model),
    coefficients = tidy(model),
    observations = augment(model)
  )
}

# Call groom_model on model, assigning to 3 variables
c(mdl, cff, obs) %<-% groom_model(model)

# See these individual variables
mdl; cff; obs

## # A tibble: 1 x 11
##   r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
##       <dbl>         <dbl> <dbl>     <dbl>    <dbl> <int>  <dbl> <dbl> <dbl>
## 1     0.224         0.210  43.3      16.3 1.61e-16     7 -1791. 3599. 3630.
## # … with 2 more variables: deviance <dbl>, df.residual <int>

## # A tibble: 7 x 5
##   term              estimate std.error statistic  p.value
##   <chr>                <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)          61.5       7.83     7.86  5.06e-14
## 2 genderfemale         10.8       4.94     2.19  2.88e- 2
## 3 income($25k,$55k]    -1.95      7.74    -0.251 8.02e- 1
## 4 income($55k,$95k]   -19.3       8.01    -2.42  1.63e- 2
## 5 income($95k,$Inf)   -18.6       7.47    -2.49  1.32e- 2
## 6 travel(0.25h,4h]    -26.6       6.00    -4.44  1.24e- 5
## 7 travel(4h,Infh)     -45.1       6.30    -7.16  5.06e-12

## # A tibble: 346 x 12
##    .rownames n_visits gender income travel .fitted .se.fit .resid   .hat .sigma
##    <chr>        <dbl> <fct>  <fct>  <fct>    <dbl>   <dbl>  <dbl>  <dbl>  <dbl>
##  1 25               2 female ($95k… (4h,I…    8.67    5.79  -6.67 0.0179   43.4
##  2 26               1 female ($95k… (4h,I…    8.67    5.79  -7.67 0.0179   43.4
##  3 27               1 male   ($95k… (0.25…   16.3     5.35 -15.3  0.0153   43.4
##  4 29               1 male   ($95k… (4h,I…   -2.18    4.79   3.18 0.0122   43.4
##  5 30               1 female ($55k… (4h,I…    7.95    6.55  -6.95 0.0229   43.4
##  6 31               1 male   [$0,$… [0h,0…   61.5     7.83 -60.5  0.0326   43.3
##  7 33              80 female [$0,$… [0h,0…   72.4     7.39   7.61 0.0291   43.4
##  8 34             104 female ($95k… [0h,0…   53.8     6.35  50.2  0.0215   43.3
##  9 35              55 male   ($25k… (0.25…   33.0     5.57  22.0  0.0165   43.4
## 10 36             350 female ($25k… [0h,0…   70.4     6.35 280.   0.0215   40.6
## # … with 336 more rows, and 2 more variables: .cooksd <dbl>, .std.resid <dbl>

Sometimes you want the return multiple things from a function, but you want the result to have a particular class (for example, a data frame or a numeric vector), so returning a list isn’t appropriate. This is common when you have a result plus metadata about the result. (Metadata is “data about the data”. For example, it could be the file a dataset was loaded from, or the username of the person who created the variable, or the number of iterations for an algorithm to converge.)

pipeable_plot <- function(data, formula) {
  plot(formula, data)
  attr(data, 'formula') <- formula
  invisible(data)
}

plt_dist_vs_speed <- cars %>% 
  pipeable_plot(dist ~ speed)

str(plt_dist_vs_speed)

## 'data.frame':    50 obs. of  2 variables:
##  $ speed: num  4 4 7 7 8 9 10 10 10 11 ...
##  $ dist : num  2 10 4 22 16 10 18 26 34 17 ...
##  - attr(*, "formula")=Class 'formula'  language dist ~ speed
##   .. ..- attr(*, ".Environment")=<environment: 0x5590dd8f7c88>

Introduction to writing functions in R

Back to home

Contents

Arguments validation

Returning multiple values from functions