Closed
Description
Hi,all,
I'm reading this book to get to know with tidymodels, and get an problem with running parallel resampling on Windows.
As to the examples in Chapter 10, the main parallel part of fit_resamples() is O.K. with library doParallel on Windows, but only one issue with extract_fit_parsnip(x):
library(tidymodels)
# All operating systems
library(doParallel)
library(kableExtra)
library(tidyr)
tidymodels_prefer()
data(ames)
ames <- mutate(ames, Sale_Price = log10(Sale_Price))
set.seed(502)
ames_split <- initial_split(ames, prop = 0.80, strata = Sale_Price)
ames_train <- training(ames_split)
ames_test <- testing(ames_split)
ames_rec <-
recipe(Sale_Price ~ Neighborhood + Gr_Liv_Area + Year_Built + Bldg_Type +
Latitude + Longitude, data = ames_train) %>%
step_log(Gr_Liv_Area, base = 10) %>%
step_other(Neighborhood, threshold = 0.01) %>%
step_dummy(all_nominal_predictors()) %>%
step_interact( ~ Gr_Liv_Area:starts_with("Bldg_Type_") ) %>%
step_ns(Latitude, Longitude, deg_free = 20)
lm_model <- linear_reg() %>% set_engine("lm")
lm_wflow <-
workflow() %>%
add_model(lm_model) %>%
add_recipe(ames_rec)
lm_fit <- fit(lm_wflow, ames_train)
# This line is O.K.
extract_fit_parsnip(lm_fit) %>% tidy()
# ------------------------------------------------------------------------------------
set.seed(1001)
ames_folds <- vfold_cv(ames_train, v = 10)
# Create a cluster object and then register:
# cl <- makePSOCKcluster(parallel::detectCores())
cl <- makePSOCKcluster(10)
registerDoParallel(cl)
get_model <- function(x) {
# not O.K. on Windows & Linux.
extract_fit_parsnip(x) %>% tidy()
# This line is O.K.
# extract_recipe(x, estimated = TRUE)
}
ctrl <- control_resamples(save_pred=TRUE,verbose=TRUE, extract = get_model)
set.seed(1003)
lm_res <- lm_wflow %>% fit_resamples(resamples = ames_folds, control = ctrl)
# Stop parallel
stopCluster(cl)
# These lines are O.K.
lm_res
lm_res$.metrics[[1]]
lm_res$.notes[[1]]
lm_res$.predictions[[1]]
> lm_res$.extracts[[1]]
# A tibble: 1 x 2
.extracts .config
<list> <chr>
1 <try-errr [1]> Preprocessor1_Model1
> # To get the results
> lm_res$.extracts[[1]][[1]]
[[1]]
[1] "Error in UseMethod(\"tidy\") : \n no applicable method for 'tidy' applied to an object of class \"lm\"\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError in UseMethod("tidy"): no applicable method for 'tidy' applied to an object of class "lm">
Any idea?
Best regards.
Activity
icejean commentedon Dec 24, 2022
Same results on both R-4.2.2 and R-4.1.2.
icejean commentedon Dec 25, 2022
Here's a workaround, the problem is function extract_fit_parsnip(x) returns a list but not a parsnip model when using library(doParallel) instead of library(doMC):
lm_fited is a list of class _lm & model_fit:


test is a list of length 1, containing a list of class _lm & model_fit, so we should use test[[1]] to refer to the model:
This issue should be fixed in the coming version then.
juliasilge commentedon Jan 9, 2023
Thanks for the report @icejean! 🙌
We have some tests of parallel PSOCK resampling that we run everyday, but I am noticing that we don't have any test of using
tidy()
in the worker; the method registration isn't working correctly in the worker. I'm going to move this over to our testing repo and we can get the bottom of this problem with S3 registration, then add a test for it.[-]Parallel issue on Windows with fit_resamples() while extracting the model[/-][+]Problem with using `tidy()` (S3 registration) on Windows in parallel[/+]icejean commentedon Jan 10, 2023
Great!
topepo commentedon Aug 1, 2023
We wouldn't expect any package to be available in psock clusters (unlike multicore). To make sure that you have them available, you can load them in the extract:
icejean commentedon Aug 5, 2023
Thanks Max, it works. I've read your book 'Tidy Modeling with R' before, the issue comes from the book, it's a good book for learning the tidyverse series.