Consider new row mutation functions

I've been wondering if dplyr needs a new set of verbs specifically designed for modifying rows identified by a set of keys:

* insert new rows — would error if keys already exist; can specify where to add (how?). Similar to `tibble::add_row()`.

* update values — overrides existing values. (Similar to https://github.com/tidyverse/tidyr/issues/183)

* patch values — like update, but only replaces missing values (Also similar to https://github.com/tidyverse/tidyr/issues/183)

* upsert — update or insert depending on presence/absence of keys

## Basic sketch

```R
df <- tribble(
  ~ x, ~ y,
  1, 2,
  2, NA,
)

# Insert can add multiple rows
df %>% insert(x = 3:4, y = 5)
#>       x     y
#>   <dbl> <dbl>
#> 1     1     2
#> 2     2    NA
#> 3     3     5
#> 4     4     5

# And new columns:
df %>% insert(x = 3, z = 1)
#>       x     y     z
#>   <dbl> <dbl> <dbl>
#> 1     1     2    NA
#> 2     2    NA    NA
#> 3     3    NA     1

#> Error: key variable `x` is missing
df %>% insert(x = 1, y = 5)
#> Error: key `x = 1` already present

# Update modifies existing rows:
df %>% update(x = 1, y = 3)
#>       x     y
#>   <dbl> <dbl>
#> 1     1     3
#> 2     2    NA

# Erroring if there's no match:
df %>% update(x = 3, y = 3)
#> Error: key `x = 3` doesn't exist

# patch() works like update but only changes missing values
df %>% patch(x = 1:2, y = 10)
#>       x     y
#>   <dbl> <dbl>
#> 1     1     3
#> 2     2    10

# upsert() updates or inserts as needed:
df %>% upsert(x = c(1, 3), y = 3)
#>       x     y
#>   <dbl> <dbl>
#> 1     1     3
#> 2     2    NA
#> 3     3     3
```

## Thoughts

* All functions would either take multiple named inputs or a single unnamed data frame.

* I think unlike SQL and `==`, it makes sense for `NA` to match `NA` in keys. That's consistent with `match()`.

* If `.key` were ordered (can we determine that quickly?) `insert()` could insert new rows in right place. Otherwise would be have `.before` and `.after` that would be a single row data frame of key values?

* `.key` would work like `.by`; defaulting to common variables with a message. But unlike joins, would check key set for uniqueness, erroring if not (i.e. no Cartesian products). That means `insert(x, y)` would always yield `nrow(x) + nrow(y)` and `update(x, y)` would always yield `nrow(x)`.

* Should all these functions get a `rows` prefix? Or suffix? `rows_insert()`, `rows_update()`, `rows_patch()`, ...? Does that suggest there's an equivalent set of `col` functions? `cols_insert()` (a single col join, like a vlookup? error if col already exists?), `cols_move()` (#4598), `cols_patch()` (df form of `coalesce()`?)  

## Backends

* Grouping would be ignored (but recomputed if needed)
* Would error for dbplyr as modifying data is out of scope.
* Could modifying in place for dtplyr when `immutable = FALSE`
* dm objects would use predefined primary keys and error if `.key` supplied
* An imaginary mutable googlesheet could perform these operations on the underlying sheet

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Consider new row mutation functions #4654

Basic sketch

Thoughts

Backends

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Consider new row mutation functions #4654

Description

Basic sketch

Thoughts

Backends

Activity

juangomezduaso commented on Dec 18, 2019

billdenney commented on Apr 21, 2020

krlmlr commented on Apr 23, 2020

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions