Skip to content

Automatically unpack unnamed df-cols #2326

Closed
@hadley

Description

@hadley
Member

Currently mutate() and summarise() only work with vectorised functions: functions that take a vector as input and return a vector (or "scalar") as output. I don't see any reason why summarise() and mutate() couldn't also accept tibbles. The existing restrictions would continue to apply so that in summarise() the tibble would have to have exactly one row, and in mutate() it would have to have either one row or n rows.

In other words, the following two lines of code should be equivalent:

df %>%
  summarise(mean = mean(x), sd = sd(x))

df %>%
   summarise(tibble(mean = mean(x), sd = sd(x))

This would allow you to extract that repeated pattern out into a function:

# and hence
mean_sd <- function(df, var) {
  tibble(mean = mean(df[[var]]), sd = sd(df[[var]]))
}
df %>% 
  summarise(mean_sd(df, "x"))

We'd need to work on documentation to help people develop effective functions of this nature develop tools so that you could easily specify input variables (using whatever the next iteration of lazyeval provides) and name the outputs. But that's largely a second-order concern: we can figure out those details later.

Supporting tibbles in this way would be particular useful for dplyr as it would help to clarify the nature of functions like separate() and unite() which are currently data frame wrappers around simple vector functions.

These ideas are most important for summarise() and mutate() but I think we should apply the same principles to filter() and arrange() as well.

cc @lionel- @jennybc @krlmlr

Activity

krlmlr

krlmlr commented on Feb 21, 2017

@krlmlr
aornugent

aornugent commented on Mar 28, 2017

@aornugent
huftis

huftis commented on Nov 2, 2017

@huftis
romainfrancois

romainfrancois commented on Dec 20, 2017

@romainfrancois
Member

Now that we have := and sort of going back to the initial #154, perhaps the lhs of := can be richer, i.e. something like this parses:

mtcars %>% 
  group_by(cyl) %>% 
  summarise( tie(mpg0,mp25,mpg50,mpg75,mpg100) := quantile(mpg) )

From this 🐦 thread https://twitter.com/romain_francois/status/943399604065849344

romainfrancois

romainfrancois commented on Feb 23, 2018

@romainfrancois
Member

I toyed with this syntax on the tie 📦 here: https://github.com/romainfrancois/tie

> iris %>% 
+   dplyr::group_by(Species) %>% 
+   bow( tie(min, max) := range(Sepal.Length) )
# A tibble: 3 x 3
  Species      min   max
  <fct>      <dbl> <dbl>
1 setosa      4.30  5.80
2 versicolor  4.90  7.00
3 virginica   4.90  7.90
> 
> x <- "min"
> iris %>% 
+   dplyr::group_by(Species) %>% 
+   bow( tie(!!x, max) := range(Sepal.Length) )
# A tibble: 3 x 3
  Species      min   max
  <fct>      <dbl> <dbl>
1 setosa      4.30  5.80
2 versicolor  4.90  7.00
3 virginica   4.90  7.90

Now it just does a classic summarise of the rhs of := wrapped in a list call, and then re-extracts into what is specified in the lhs, i.e. it does this:

> iris %>% 
+   group_by(Species) %>% 
+   summarise( ..tmp.. = list(range(Sepal.Length)) ) %>% 
+   mutate( min = map_dbl(..tmp.., 1), max = map_dbl(..tmp.., 2) ) %>% 
+   select( -..tmp..)
# A tibble: 3 x 3
  Species      min   max
  <fct>      <dbl> <dbl>
1 setosa      4.30  5.80
2 versicolor  4.90  7.00
3 virginica   4.90  7.90
aornugent

aornugent commented on Mar 5, 2018

@aornugent
t-kalinowski

t-kalinowski commented on Apr 11, 2018

@t-kalinowski

23 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    featurea feature request or enhancement

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @hadley@huftis@krlmlr@romainfrancois@t-kalinowski

        Issue actions

          Automatically unpack unnamed df-cols · Issue #2326 · tidyverse/dplyr