library(tidyverse)#> ── Attaching core tidyverse packages ───────────────────── tidyverse 2.0.0 ──#> ✔ dplyr 1.1.4 ✔ readr 2.1.5#> ✔ forcats 1.0.0 ✔ stringr 1.5.1#> ✔ ggplot2 3.5.2 ✔ tibble 3.2.1#> ✔ lubridate 1.9.4 ✔ tidyr 1.3.1#> ✔ purrr 1.1.0 #> ── Conflicts ─────────────────────────────────────── tidyverse_conflicts() ──#> ✖ dplyr::filter() masks stats::filter()#> ✖ dplyr::lag() masks stats::lag()#> ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errorsdata(crickets, package ="modeldata")names(crickets)#> [1] "species" "temp" "rate"# Plot the temperature on the x-axis, the chirp rate on the y-axis. The plot# elements will be colored differently for each species:ggplot(crickets,aes(x =temp, y =rate, color =species, pch =species, lty =species))+# Plot points for each data point and color by speciesgeom_point(size =2)+# Show a simple linear model fit created separately for each species:geom_smooth(method =lm, se =FALSE, alpha =0.5)+scale_color_brewer(palette ="Paired")+labs(x ="Temperature (C)", y ="Chirp Rate (per minute)")#> `geom_smooth()` using formula = 'y ~ x'
Relationship between chirp rate and temperature for two different species of crickets
rate~temp+species+temp:species# A shortcut can be used to expand all interactions containing# interactions with two variables:rate~(temp+species)^2# Another shortcut to expand factors to include all possible# interactions (equivalent for this example):rate~temp*species
# Fit a reduced model:main_effect_fit<-lm(rate~temp+species, data =crickets)# Compare the two:anova(main_effect_fit, interaction_fit)#> Analysis of Variance Table#> #> Model 1: rate ~ temp + species#> Model 2: rate ~ (temp + species)^2#> Res.Df RSS Df Sum of Sq F Pr(>F)#> 1 28 89.350 #> 2 27 85.074 1 4.2758 1.357 0.2542
在这三种情况下,不同的开发人员团队为同一任务设计了三种截然不同的界面。每种界面都各有优缺点。相比之下,《Python开发者指南》在解决问题时信奉这样一种理念:“There should be one – and preferably only one – obvious way to do it.” 在这方面,R与Python有很大不同。R的接口多样性的一个优势在于,它可以随着时间的推移不断发展,满足不同用户的不同需求。不幸的是,部分语法多样性源于对开发代码者需求的关注,而非对使用代码者需求的关注。各软件包之间的不一致性可能会成为R用户的绊脚石。
为了解决此处描述的使用问题,tidymodels包设定了一系列设计目标。tidymodels的大多数设计目标都属于tidyverse中已有的“为人类而设计”这一范畴(Wickham et al. 2019),但这些目标在建模代码方面有特定的应用。此外,tidymodels还有一些额外的设计目标,作为对tidyverse设计目标的补充。以下是一些例子:
corr_res<-map(mtcars%>%select(-mpg), cor.test, y =mtcars$mpg)# The first of ten results in the vector:corr_res[[1]]#> #> Pearson's product-moment correlation#> #> data: .x[[i]] and mtcars$mpg#> t = -8.9197, df = 30, p-value = 6.113e-10#> alternative hypothesis: true correlation is not equal to 0#> 95 percent confidence interval:#> -0.9257694 -0.7163171#> sample estimates:#> cor #> -0.852162
corr_res%>%# Convert each to a tidy format; `map_dfr()` stacks the data framesmap_dfr(tidy, .id ="predictor")%>%ggplot(aes(x =fct_reorder(predictor, estimate)))+geom_point(aes(y =estimate))+geom_errorbar(aes(ymin =conf.low, ymax =conf.high), width =.1)+labs(x =NULL, y ="Correlation with mpg")
Correlations (and 95% confidence intervals) between predictors and the outcome in the mtcars data set
model_by_species<-split_by_species%>%mutate(model =map(data, ~lm(rate~temp, data =.x)))model_by_species#> # A tibble: 2 × 3#> species data model #> <fct> <list<tibble[,2]>> <list>#> 1 O. exclamationis [14 × 2] <lm> #> 2 O. niveus [17 × 2] <lm>
tidymodels_prefer(quiet =FALSE)#> [conflicted] Will prefer agua::refit over any other package.#> [conflicted] Will prefer DALEX::explain over any other package.#> [conflicted] Will prefer dials::Laplace over any other package.#> [conflicted] Will prefer dials::max_rules over any other package.#> [conflicted] Will prefer dials::neighbors over any other package.#> [conflicted] Will prefer dials::prune over any other package.#> [conflicted] Will prefer dials::smoothness over any other package.#> [conflicted] Will prefer dplyr::collapse over any other package.#> [conflicted] Will prefer dplyr::combine over any other package.#> [conflicted] Will prefer dplyr::filter over any other package.#> [conflicted] Will prefer dplyr::rename over any other package.#> [conflicted] Will prefer dplyr::select over any other package.#> [conflicted] Will prefer dplyr::slice over any other package.#> [conflicted] Will prefer ggplot2::`%+%` over any other package.#> [conflicted] Will prefer ggplot2::margin over any other package.#> [conflicted] Will prefer parsnip::bart over any other package.#> [conflicted] Will prefer parsnip::fit over any other package.#> [conflicted] Will prefer parsnip::mars over any other package.#> [conflicted] Will prefer parsnip::pls over any other package.#> [conflicted] Will prefer purrr::cross over any other package.#> [conflicted] Will prefer purrr::invoke over any other package.#> [conflicted] Will prefer purrr::map over any other package.#> [conflicted] Will prefer recipes::discretize over any other package.#> [conflicted] Will prefer recipes::step over any other package.#> [conflicted] Will prefer recipes::update over any other package.#> [conflicted] Will prefer rsample::populate over any other package.#> [conflicted] Will prefer scales::rescale over any other package.#> [conflicted] Will prefer themis::step_downsample over any other package.#> [conflicted] Will prefer themis::step_upsample over any other package.#> [conflicted] Will prefer tidyr::expand over any other package.#> [conflicted] Will prefer tidyr::extract over any other package.#> [conflicted] Will prefer tidyr::pack over any other package.#> [conflicted] Will prefer tidyr::unpack over any other package.#> [conflicted] Will prefer tune::parameters over any other package.#> [conflicted] Will prefer tune::tune over any other package.#> [conflicted] Will prefer yardstick::get_weights over any other package.#> [conflicted] Will prefer yardstick::precision over any other package.#> [conflicted] Will prefer yardstick::recall over any other package.#> [conflicted] Will prefer yardstick::spec over any other package.#> [conflicted] Removing existing preference.#> [conflicted] Will prefer recipes::update over Matrix::update.#> ── Conflicts ───────────────────────────────────────── tidymodels_prefer() ──