Skip to content

filter() and data frame results, filter(across()) #4678

Closed
@romainfrancois

Description

@romainfrancois
Member

when we get a data frame from an expression in filter() perhaps we should & all its columns, this would enable something like

library(dplyr, warn.conflicts = FALSE)

iris %>% 
  filter(across(starts_with("Sepal"), ~ . > 4))
#> Error: filter() expressions should return logical vectors of the same size as the group

iris %>% 
  filter(Sepal.Length > 4 & Sepal.Width > 4)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1          5.7         4.4          1.5         0.4  setosa
#> 2          5.2         4.1          1.5         0.1  setosa
#> 3          5.5         4.2          1.4         0.2  setosa

iris %>% 
  filter_at(vars(starts_with("Sepal")), all_vars(. > 4))
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1          5.7         4.4          1.5         0.4  setosa
#> 2          5.2         4.1          1.5         0.1  setosa
#> 3          5.5         4.2          1.4         0.2  setosa

Created on 2019-12-30 by the reprex package (v0.3.0.9000)

This might be a better model than the current strategy of tricking ... into a single expression with all_exprs()

Activity

added this to the 0.9.0 milestone on Dec 30, 2019
hadley

hadley commented on Dec 30, 2019

@hadley
Member

Yeah, that makes sense to me.

OTOH maybe all_vars() and any_vars() should become across_any() and across_all()?

romainfrancois

romainfrancois commented on Dec 30, 2019

@romainfrancois
MemberAuthor

... or we just need a function somewhere that would take a list of logical vector and reduce& them, so we use this around the across() call:

library(dplyr, warn.conflicts = FALSE)
library(purrr)

iris <- as_tibble(iris)

rowAll <- function(df) {
  purrr::reduce(df, `&`)
}
rowAny <- function(df) {
  purrr::reduce(df, `|`)
}

iris %>% 
  filter(rowAll(across(starts_with("Sepal"), ~ . > 3)))
#> # A tibble: 67 x 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.7         3.2          1.3         0.2 setosa 
#>  3          4.6         3.1          1.5         0.2 setosa 
#>  4          5           3.6          1.4         0.2 setosa 
#>  5          5.4         3.9          1.7         0.4 setosa 
#>  6          4.6         3.4          1.4         0.3 setosa 
#>  7          5           3.4          1.5         0.2 setosa 
#>  8          4.9         3.1          1.5         0.1 setosa 
#>  9          5.4         3.7          1.5         0.2 setosa 
#> 10          4.8         3.4          1.6         0.2 setosa 
#> # … with 57 more rows

iris %>% 
  filter(rowAny(across(starts_with("Sepal"), ~ . > 3)))
#> # A tibble: 150 x 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # … with 140 more rows

Created on 2019-12-30 by the reprex package (v0.3.0.9000)

romainfrancois

romainfrancois commented on Dec 30, 2019

@romainfrancois
MemberAuthor

But still, given the way across() works with other verbs, this would not be surprising that :

%>% filter(across(starts_with("Sepal", test)))

give the same result as :

%>% filter(test(Sepal.Length), test(Sepal.Width))
hadley

hadley commented on Dec 30, 2019

@hadley
Member

Yeah, I'd say implement the data frame method regardless, and we'll come back later to talk about the overall interface (I suspect we will want row version of all the existing cumulative and summarising functions)

added 3 commits that reference this issue on Dec 30, 2019
2c6432c
ac06c5b
6891756
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Labels

featurea feature request or enhancement

Type

No type

Projects

No projects

Relationships

None yet

    Development

    No branches or pull requests

      Participants

      @hadley@romainfrancois

      Issue actions

        filter() and data frame results, filter(across()) · Issue #4678 · tidyverse/dplyr