Closed
Description
A common case is that one constructs a grouping variable in group_by
but only needs it for the duration of the group_by
so afterwards one must use select
to get rid of it as in the example below. It would be pleasingly symmetric if ungroup
could remove the added column just as group_by
adds it so
ungroup(-g)
would be the same as
ungroup %>%
select(-g)
Thus in this example taken from https://stackoverflow.com/questions/51939874/referencing-previous-column-value-as-column-is-created/51940343#51940343
test <- structure(list(i = c(0, 1, 2, 3, 4, 0, 1, 2, 3, 4), chng = c(0,
0.031, 0.005, -0.005, 0.017, 0, 0.012, 0.003, -0.013, -0.005),
indx = c(1, 1.031, 1.037, 1.031, 1.048, 1, 1.012, 1.015,
1.002, 0.997)), class = "data.frame", row.names = c(NA, -10L
))
test %>%
group_by(g = cumsum(i == 0)) %>%
mutate(indx = cumprod(chng + 1)) %>%
ungroup %>%
select(-g)
we could write using one fewer statement, i.e. the last two lines of code above are combined into the last line below.
test %>%
group_by(g = cumsum(i == 0)) %>%
mutate(indx = cumprod(chng + 1)) %>%
ungroup(-g)
Note the reduced line count and improved symmetry.
Activity
romainfrancois commentedon Sep 14, 2018
🤔
ungroup
does have an...
it does not use:but I'm not sure about having
ungroup
also perform selectionmkoohafkan commentedon Oct 5, 2018
Seems to me that incorporating this kind of logic into #3721 would be the better solution for this use case.
I do think it would be neat if
ungroup
could selectively remove some groupings but not others, e.g.would be equivalent to
which is how I first interpreted the title of this issue.
ggrothendieck commentedon Oct 20, 2018
Here is another example taken from https://stackoverflow.com/questions/52906985/merging-of-duplicate-rows-that-have-misspelled-variables/52907932#52907932
With the feature under discussion this would simplify to the shorter and more symmetric:
ggrothendieck commentedon Oct 21, 2018
@mkoohafkan, The way
group_by
currently works is that if you want to incrementally add a variable specifygroup_by(new_var, add = TRUE)
.I suppose there is the question of whether
add=TRUE
means add the variable to thegroup_by
or really means modify thegroup_by
and replace it with a newgroup_by
. In this latter case it would make sense to writegroup_by(-cyl, add = TRUE)
to removecyl
from thegroup_by
while leaving the othergroup_by
variables in effect rather than usingungroup
for that.Another possibility is to use
ungroup(cyl, subtract = TRUE)
for that analogously togroup_by(new_var, add = TRUE)
.One other point is that I don't think incrementally adding and removing parts of a
group_by
is that frequently encountered whereas I have repeated encountered theungroup %>% select(-var)
sequence.mkoohafkan commentedon Oct 30, 2018
@ggrothendieck thought about this more and I agree with your statements that
ungroup(cyl)
to drop the columncyl
is symmetric andgroup_by(-cyl)
to remove a column from an existing grouping would be a bit confusing with the existingadd
argument. If theadd
argument togroup_by
had originally been namedupdate
this would be syntactically cleaner, e.g.group_by(cyl, update = TRUE)
andgroup_by(-cyl, update = TRUE)
.ungroup(..., subtract = TRUE)
looks like a good idea at first but... what wouldungroup(cyl, subtract = FALSE)
mean?yutannihilation commentedon Oct 31, 2018
group_by()
has mutate semantics, not select semantics (c.f. https://dplyr.tidyverse.org/articles/dplyr.html#selecting-operations). I guess you already noticed this when you triedgroup_by(-cyl, add = TRUE)
and saw-cyl
became the grouping variable.Created on 2018-10-31 by the reprex package (v0.2.1)
So, to me,
ungroup()
should have mutate semantics as well for consistency (though I don't know what it means to mutate when ungrouping...). A possible solution is to implement scoped variants forungroup()
? (e.g.ungroup_at()
)?ggrothendieck commentedon Nov 10, 2018
Here is another case where this feature could be used taken from https://stackoverflow.com/questions/53240324/dplyr-collapse-tail-rows-into-larger-groups/53240699#53240699
In this case we are manufacturing a sort key in order to keep the table in its original sorted order.
With the feature underdiscussion the select at the end of the code could be combined into the ungroup and so omitted.
Note how this keeps coming up again and again.
maxmoro commentedon Dec 5, 2018
Having a selective ungroup is also very import when calculating percentages of subgroups.
hadley commentedon Dec 10, 2019
I think it would be fine for
ungroup()
to have select semantics even whilegroup()
has action semantics. I'd suggestdf %>% ungroup()
would continue to work as usual, anddf %>% ungroup(x)
would removex
from the grouping variables, throwing an error if not currently grouped byx
.Selectively ungroup variables (#4671)
lock commentedon Jun 24, 2020
This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/