6 Functions

Published

April 14, 2025

Modified

April 22, 2025

Introduction

在学习本章之前，你一定已经写过很多用来减少重复工作的函数，本章会将你在工作中的一些知识进行整合提升，帮助你从理论的视角看待函数。在本章，你会看到一些有趣的小技巧和复杂技术，请一定要留心，这些内容是后面章节中的基石。

Quiz

下面的代码运行时会报错吗？

f2 <- function(a, b) {
  a * 10
}
f2(10, stop("This is an error!"))

无论运行成功或失败，函数如何都执行一些操作？

Outline

6.2节：构成函数的三要素和原始函数（primitive function）。
6.3节：函数联合使用的三种方法及其优劣。
6.4节：词法作用域规则（rules of lexical scoping）——如何根据名字找到对应的值。
6.5节：参数评估原则：只在第一次使用时评估，避免循环。
6.6节：特殊参数...。
6.7节：函数退出机制。
6.8节：四种函数格式。

Function fundamentals

Function components

函数由三部分组成：

formals()：参数和函数名，控制如何调用函数。
body()：函数具体实现。
environment()：环境，决定函数如何找到参数对应的值。

在这三部分中，formals和body十分显性，构建出函数时就可以直观地看到；environment相对隐性，需要使用函数environment()。

f02 <- function(x, y) {
  # A comment
  x + y
}

formals(f02)
#> $x
#> 
#> 
#> $y

body(f02)
#> {
#>     x + y
#> }

environment(f02)
#> <environment: R_GlobalEnv>

与R中的其他对象一样，函数也会拥有一些属性。一个常见的属性是srcref（source reference），记录了函数的源代码，如下所示。

attr(f02, "srcref")
#> NULL

Primitive functions

原始函数（primitive function）是一个例外，不包含上面的三要素。

sum
#> function (..., na.rm = FALSE)  .Primitive("sum")
`[`
#> .Primitive("[")

它们的base type不同。（base type 见第12章）

typeof(sum)
#> [1] "builtin"
typeof(`[`)
#> [1] "special"

因为它们的底层是C语言，所以formals()、body()和environment()返回值都是NULL。

formals(sum)
#> NULL
body(sum)
#> NULL
environment(sum)
#> NULL

First-class functions

所谓的first-class，就是指函数本身就是对象，可以用来赋值给变量，可以传递给函数等等。创建函数只需要function()来定义，使用<-进行绑定。

如果一个函数没有绑定名字，那么它就是匿名函数。匿名函数通常用在lapply()等函数中，用来实现高效的数据处理。

lapply(mtcars, function(x) length(unique(x)))
Filter(function(x) !is.numeric(x), mtcars)
integrate(function(x) sin(x)^2, 0, pi)

有关R函数的闭包特性，见第7章。

Invoking a function

使用函数的常见方式是myfun(param1, param2, ...)，如果你有了一组参数数据，可以使用do.call()来调用函数。

args <- list(1:10, na.rm = TRUE)
do.call(mean, args)
#> [1] 5.5

Exercises

使用is.function()来判断一个对象是否是函数。对于原始函数，使用is.primitive()。

Function composition

base R 提供了两种函数组合使用的方式。例如，当你想使用sqrt()和mean()计算某个群体的方差：

x <- runif(100)
square <- function(x) x^2
deviation <- function(x) x - mean(x)

第一种方法：

sqrt(mean(square(deviation(x))))
#> [1] 0.2744786

第二种方法：

out <- deviation(x)
out <- square(out)
out <- mean(out)
out <- sqrt(out)
out
#> [1] 0.2744786

magrittr包提供了第三种方式——管道符%>%（R 4.0 之后可以之间使用 |> 代替）。

library(magrittr)

x %>%
  deviation() %>%
  square() %>%
  mean() %>%
  sqrt()
#> [1] 0.2744786

Lexical scoping

在第二章中，我们讲到为对象命名；在这里，我们介绍它的反面，根据名字找到对象——词法作用域(lexical scoping)。

R 的词法作用域遵循下面四个规则：

Name masking——函数内部的变量优先于函数外部的变量。
Functions versus variables——当某个函数与变量同名时，R自动找到对应的对象。
A fresh start——函数每次执行都是独立的。
Dynamic lookup——函数只有在运行时才会查找对应的对象。

Name masking

函数查找变量时，首先查找函数内的变量，然后再查找函数外，即——由内到外逐级查找，直至找到为止。

x <- 10
y <- 20
z <- 30
g05 <- function() {
  x <- 1
  y <- 2
  c(x, y, z)
}
g05()
#> [1]  1  2 30

Functions versus variables

诚如上述，函数也是普通的对象，在进行函数的查找时，遵循相同的规则。

g07 <- function(x) x + 1
g08 <- function() {
  g07 <- function(x) x + 100
  g07(10)
}
g08()
#> [1] 110

如果，函数名和变量名重复（函数和变量在不同的环境中）时，R会自动找到对应类型的对象，例如下面的g09。实际编写代码时，我们要尽可能避免这种情况的发生，因为十分歧义和迷惑。

g09 <- function(x) x + 100
g10 <- function() {
  g09 <- 10
  g09(g09)
}
g10()
#> [1] 110

A fresh start

下面的例子，每次运行g11()结果都是相同的，因为每次运行函数时，都会创建一个变量域（环境），它们之间相互独立。

g11 <- function() {
  if (!exists("a")) {
    a <- 1
  } else {
    a <- a + 1
  }
  a
}

g11()
#> [1] 1
g11()
#> [1] 1

当你使用a <- g11()时，就打破了独立。

a <- g11()
g11()
#> [1] 2
g11()
#> [1] 2

Dynamic lookup

函数只有在运行时才会根据名字查找对象，也即前后环境不一致时，函数前后运行值也就不一样。

g12 <- function() x + 1
x <- 15
g12()
#> [1] 16

x <- 20
g12()
#> [1] 21

使用codetools::findGlobals()可以里列出函数内的外部依赖项。

codetools::findGlobals(g12)
#> [1] "+" "x"

environment(g12) <- emptyenv()
g12()
#> Error in x + 1: could not find function "+"

Exercises

…

Lazy evaluation

R 函数中的参数具有惰性评估（Lazy evaluation）特点：只有参数被使用时，才会对其进行评估（运行）。

h01 <- function(x) {
  10
}
h01(stop("This is an error!"))
#> [1] 10

这一特性允许解析耗时的参数只有在函数运行且被调用时进行解析。

Promises

惰性评估由一种被称为promise或thunk的数据结构赋能（这种数据结构，本人也不是很了解，可以GPT一下）。

promise有三种类型：

表达式，如1 + 1。
创建的环境，如：函数自己的变量域。

y <- 10
h02 <- function(x) {
  y <- 100
  x + 1
}

h02(y)
#> [1] 11
h02(y <- 1000)
#> [1] 1001
y
#> [1] 1000

无需重复计算的值，如下面的message结果只打印一次。

double <- function(x) {
  message("Calculating...")
  x * 2
}

h03 <- function(x) {
  c(x, x)
}

h03(double(20))
#> [1] 40 40

x <- double(20)
h03(x)
#> [1] 40 40

promise类型的惰性评估具有一种“薛定谔的猫”的特点，任何尝试用R去评估它的操作都会破坏其promise特性。

Default arguments

惰性评估，允许函数在设置默认参数时，引用函数内部变量域，例如下面的例子。在base R中的许多函数都有类似的应用，但是不推荐这样使用，这样会增加函数理解的困难。

h04 <- function(x = 1, y = x * 2, z = a + b) {
  a <- 10
  b <- 100

  c(x, y, z)
}

h04()
#> [1]   1   2 110

另外一个需要注意的是：将要惰性评估的表达式，“作为默认参数”和“直接传递给函数”，是两种不同的情况。下面的示例中，ls()作为默认参数时，评估的是函数内部的变量域，而不是外部环境。

h05 <- function(x = ls()) {
  a <- 1
  x
}

# ls() evaluated inside h05:
h05()
#> [1] "a" "x"

h05(ls())
#>  [1] "a"               "args"            "deviation"       "double"         
#>  [5] "f02"             "g05"             "g07"             "g08"            
#>  [9] "g09"             "g10"             "g11"             "g12"            
#> [13] "h01"             "h02"             "h03"             "h04"            
#> [17] "h05"             "out"             "pandoc_dir"      "quarto_bin_path"
#> [21] "square"          "status"          "x"               "y"              
#> [25] "z"

Missing arguments

函数missing()可以用来判断函数参数值来源，若来自于默认值，那么返回TRUE，否则返回FALSE。

h06 <- function(x = 10) {
  list(missing(x), x)
}
str(h06())
#> List of 2
#>  $ : logi TRUE
#>  $ : num 10
str(h06(10))
#> List of 2
#>  $ : logi FALSE
#>  $ : num 10

Exercises

下面代码发生的过程：
1. promisex = {y <- 1; 2}在函数f1自己创建的环境中被评估，赋值1给y，返回数值2。
2. promise最终的运行结果——2，赋值给函数参数x。
3. 因为Name masking，不使用函数默认值，使用先前赋值为1的y。
4. 因为函数内部评估不影响外部变量，所以最外面的y仍为10。

y <- 10
f1 <- function(x = {
                 y <- 1
                 2
               }, y = 0) {
  c(x, y)
}
f1()
#> [1] 2 1
y
#> [1] 10

`...`(dot-dot-dot)

...是R函数的一个特殊参数，它使得R函数可以有任意数目的参数。

应用`...`

...主要应用在下面两种情况：

要传递额外参数给另外一个函数。

i01 <- function(y, z) {
  list(y = y, z = z)
}

i02 <- function(x, ...) {
  i01(...)
}

str(i02(x = 1, y = 2, z = 3))
#> List of 2
#>  $ y: num 2
#>  $ z: num 3

# 常见的apply家族函数
x <- list(c(1, 3, NA), c(4, NA, 6))
str(lapply(x, mean, na.rm = TRUE))
#> List of 2
#>  $ : num 2
#>  $ : num 5

S3面向对象中的方法函数，如下面的print()函数，对于不同类使用不同参数。S3面向对象详见第13章。

print(factor(letters), max.levels = 4)
#>  [1] a b c d e f g h i j k l m n o p q r s t u v w x y z
#> 26 Levels: a b c ... z

print(y ~ x, showEnv = TRUE)
#> y ~ x
#> <environment: R_GlobalEnv>

解析`...`

可以使用..N的形式，来访问...中的第N个参数。

i03 <- function(...) {
  list(first = ..1, third = ..3)
}
str(i03(1, 2, 3))
#> List of 2
#>  $ first: num 1
#>  $ third: num 3

可以使用list(...)将其转换为list，储存起来。

i04 <- function(...) {
  list(...)
}
str(i04(a = 1, b = 2))
#> List of 2
#>  $ a: num 1
#>  $ b: num 2

rlang包提供了额外的解析方法。

rlang::list2()
rlang::enquos()

Exiting a function

大多数函数的退出机制有两种：

显性或隐性的返回一个值，表示运行成功。
抛出错误信息，表示运行失败。

Implicit versus explicit returns

使用return()指定返回值。
如果不使用return()指定返回值，默认使用最后运行代码的值作为返回值。

j01 <- function(x) {
  if (x < 10) {
    0
  } else {
    10
  }
}
j01(5)
#> [1] 0
j01(15)
#> [1] 10

j02 <- function(x) {
  if (x < 10) {
    return(0)
  } else {
    return(10)
  }
}
j02(5)
#> [1] 0
j02(15)
#> [1] 10

Invisible values

如果没有将函数的返回值赋值给某个变量，函数会将结果打印出来。

j03 <- function() 1
x <- j03()
j03()
#> [1] 1

使用invisible()可以阻止函数自动打印。

j04 <- function() invisible(1)
j04()
print(j04())
#> [1] 1
(j04())
#> [1] 1

使用withVisible()可以获取函数的返回值和是否可见。

str(withVisible(j04()))
#> List of 2
#>  $ value  : num 1
#>  $ visible: logi FALSE

最常见的隐藏返回值的函数就是<-。

a <- 2
(a <- 2)
#> [1] 2

Errors

当函数运行失败时，应当使用stop()函数抛出错误信息，并终止函数运行。抛出错误信息是为了让使用者知道函数运行失败的原因，以及如何处理。

j05 <- function() {
  stop("I'm an error")
  return(10)
}
j05()
#> Error in j05(): I'm an error

Exit handlers

在函数处理过程中，经常会有更新当前工作路径、绘图参数等全局变量，在函数运行结束后又要复原这些全局变量的操作。此时可以使用on.exit()函数来添加函数在退出时的操作。下面的示例显示了：无论函数运行成功还是失败，on.exit()函数都会执行。

j06 <- function(x) {
  cat("Hello\n")
  on.exit(cat("Goodbye!\n"), add = TRUE)

  if (x) {
    return(10)
  } else {
    stop("Error")
  }
}

j06(TRUE)
#> Hello
#> Goodbye!
#> [1] 10

j06(FALSE)
#> Hello
#> Error in j06(FALSE): Error

on.exit()函数的另外两个参数：

add：当有多个退出操作时，如果add为FALSE则新的操作会覆盖原来的操作，推荐总是设置为TRUE。
after：当有多个退出操作时，如果after为FALSE，新的操作会最先执行。

j08 <- function() {
  on.exit(message("a"), add = TRUE)
  on.exit(message("b"), add = FALSE)
}
j08()

j09 <- function() {
  on.exit(message("a"), add = TRUE, after = TRUE)
  on.exit(message("b"), add = TRUE, after = TRUE)
  on.exit(message("c"), add = TRUE, after = FALSE)
}
j09()

Exercises

了解一下sink(),capture.output()函数。

Function forms

Tip

R 里面的两句slogan：

Everything that exists is an object.
Everything that happens is a function call. — John Chambers

R 中的函数有四种变体：

prefix：函数名在参数前，例如mean(x)。
infix：函数名在参数之间，例如x + y的+；可以使用%前后包裹函数名，进行自定义。
replacement：带有<-赋值操作的函数，例如names(df) <- c("a", "b")。
special：例如[[、if、for等。

Rewriting to prefix form

任何形式的函数都可以改写成prefix形式。

x + y
`+`(x, y)

names(df) <- c("x", "y", "z")
`names<-`(df, c("x", "y", "z"))

for (i in 1:10) print(i)
`for`(i, 1:10, print(i))

R 的这种特性，可以让你随意地更改R中的基本函数。下面是一个更新了(函数的例子，大约10次中有1次，返回值会加1。

`(` <- function(e1) {
  if (is.numeric(e1) && runif(1) < 0.1) {
    e1 + 1
  } else {
    e1
  }
}
replicate(50, (1 + 2))
#>  [1] 3 3 3 3 3 3 3 3 3 3 3 3 4 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
#> [37] 3 3 4 3 4 3 3 3 3 4 3 3 3 3
rm("(")

Prefix form

prefix 格式的函数可以通过下面三种方式检索参数（也是检索优先级，由高至低）：

使用参数名。
使用参数名部分匹配。
使用参数位置。

k01 <- function(abcdef, bcde1, bcde2) {
  list(a = abcdef, b1 = bcde1, b2 = bcde2)
}

str(k01(1, 2, 3))
str(k01(2, 3, abcdef = 1)) 3

# Can abbreviate long argument names:
str(k01(2, 3, a = 1))

# But this doesn't work because abbreviation is ambiguous
str(k01(1, 3, b = 1))
#> Error in parse(text = input): <text>:2:28: unexpected numeric constant
#> 1: str(k01(1, 2, 3))
#> 2: str(k01(2, 3, abcdef = 1)) 3
#>                               ^

通常使用位置的参数是函数参数中最常用的几个，不推荐使用部分匹配设置参数。很遗憾，在R中无法禁用模糊匹配，但可以设置options(warnPartialMatchArgs = TRUE)，生成警告信息。

options(warnPartialMatchArgs = TRUE)
x <- k01(a = 1, 2, 3)

Infix form

infix 格式的函数要求只能有两个参数。在base R中有许多这种格式的函数：:, ::, :::, $, @, ^, *, /, +, -, >, >=, <, <=, ==, !=, !, &, &&, |, ||, ~, <-, and <<-。也可以使用%前后包裹函数名，进行自定义。例如%*%,%in%等。

`%+%` <- function(a, b) paste0(a, b)
"new " %+% "string"
#> [1] "new string"

%之间的函数名可以是除%外的任意字符，需要转义的字符只需在定义时进行转义，使用时无须转义。

`% %` <- function(a, b) paste(a, b)
`%/\\%` <- function(a, b) paste(a, b)

"a" % % "b"
#> [1] "a b"
"a" %/\% "b"
#> [1] "a b"

infix 格式的函数总是将其左右两端的参数作为输入。

`%-%` <- function(a, b) paste0("(", a, " %-% ", b, ")")
"a" %-% "b" %-% "c"
#> [1] "((a %-% b) %-% c)"

Replacement form

replacement 格式的函数要求：

至少两个参数，分别是待赋值的对象和值。
必须返回更新值后的对象。

`second<-` <- function(x, value) {
  x[2] <- value
  x
}

x <- 1:10
second(x) <- 5L
x
#>  [1]  1  5  3  4  5  6  7  8  9 10

如果你要添加额外参数，需要将其放置在x和value之间。

`modify<-` <- function(x, position, value) {
  x[position] <- value
  x
}
modify(x, 1) <- 10
x
#>  [1] 10  5  3  4  5  6  7  8  9 10

使用tracemem()追踪内存地址的变化。

x <- 1:10
tracemem(x)
#> [1] "<000001511E91C240>"

second(x) <- 6L
#> tracemem[0x000001511e91c240 -> 0x000001512074b538]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main 
#> tracemem[0x000001512074b538 -> 0x00000151207565d8]: second<- eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> execute .main

Sepcial forms

下面是一些特殊格式的函数和它转换为prefix后的格式

special form	prefix form
(x)	`(`(x)
{x}	`{`(x)
x[i]	`[`(x, i)
x[[i]]	`[[`(x, i)
if (cond) true	`if`(cond, true)
if (cond) true else false	`if`(cond, true, false)
for(var in seq) action	`for`(var, seq, action)
while (cond) action	`while`(cond, action)
repeat expr	`repeat`(expr)
next	`next`()
break	`break`()
function(arg1, arg2) {body}	`function`(alist(arg1, arg2), body, env )

Exercises

…

Introduction

Quiz

Outline

Function fundamentals

Function components

Primitive functions

First-class functions

Invoking a function

Exercises

Function composition

Lexical scoping

Name masking

Functions versus variables

A fresh start

Dynamic lookup

Exercises

Lazy evaluation

Promises

Default arguments

Missing arguments

Exercises

...(dot-dot-dot)

应用...

解析...

Exiting a function

Implicit versus explicit returns

Invisible values

Errors

Exit handlers

Exercises

Function forms

Rewriting to prefix form

Prefix form

Infix form

Replacement form

Sepcial forms

Exercises

`...`(dot-dot-dot)

应用`...`

解析`...`