These are generic functions that dispatch to individual tbl methods - see the
method documentation for details of individual data sources. x
and
y
should usually be from the same data source, but if copy
is
TRUE
, y
will automatically be copied to the same source as x
.
inner_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...) left_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...) right_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...) full_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...) semi_join(x, y, by = NULL, copy = FALSE, ...) nest_join(x, y, by = NULL, copy = FALSE, keep = FALSE, name = NULL, ...) anti_join(x, y, by = NULL, copy = FALSE, ...)
x, y | tbls to join |
---|---|
by | a character vector of variables to join by. If To join by different variables on x and y use a named vector.
For example, |
copy | If |
suffix | If there are non-joined duplicate variables in |
... | other parameters passed onto methods, for instance, |
keep | If |
name | the name of the list column nesting joins create. If |
Currently dplyr supports four types of mutating joins, two types of filtering joins, and a nesting join.
Mutating joins combine variables from the two data.frames:
inner_join()
return all rows from x
where there are matching
values in y
, and all columns from x
and y
. If there are multiple matches
between x
and y
, all combination of the matches are returned.
left_join()
return all rows from x
, and all columns from x
and y
. Rows in x
with no match in y
will have NA
values in the new
columns. If there are multiple matches between x
and y
, all combinations
of the matches are returned.
right_join()
return all rows from y
, and all columns from x
and y. Rows in y
with no match in x
will have NA
values in the new
columns. If there are multiple matches between x
and y
, all combinations
of the matches are returned.
full_join()
return all rows and all columns from both x
and y
.
Where there are not matching values, returns NA
for the one missing.
Filtering joins keep cases from the left-hand data.frame:
semi_join()
return all rows from x
where there are matching
values in y
, keeping just columns from x
.A semi join differs from an inner join because an inner join will return
one row of x
for each matching row of y
, where a semi
join will never duplicate rows of x
.
anti_join()
return all rows from x
where there are not
matching values in y
, keeping just columns from x
.
Nesting joins create a list column of data.frames:
nest_join()
return all rows and all columns from x
. Adds a
list column of tibbles. Each tibble contains all the rows from y
that match that row of x
. When there is no match, the list column is
a 0-row tibble with the same column names and types as y
.nest_join()
is the most fundamental join since you can recreate the other joins from it.
An inner_join()
is a nest_join()
plus an tidyr::unnest()
, and left_join()
is a
nest_join()
plus an unnest(.drop = FALSE)
.
A semi_join()
is a nest_join()
plus a filter()
where you check that every element of data has
at least one row, and an anti_join()
is a nest_join()
plus a filter()
where you check every element has zero rows.
Groups are ignored for the purpose of joining, but the result preserves
the grouping of x
.
# "Mutating" joins combine variables from the LHS and RHS band_members %>% inner_join(band_instruments)#>#> # A tibble: 2 x 3 #> name band plays #> <chr> <chr> <chr> #> 1 John Beatles guitar #> 2 Paul Beatles bassband_members %>% left_join(band_instruments)#>#> # A tibble: 3 x 3 #> name band plays #> <chr> <chr> <chr> #> 1 Mick Stones <NA> #> 2 John Beatles guitar #> 3 Paul Beatles bassband_members %>% right_join(band_instruments)#>#> # A tibble: 3 x 3 #> name band plays #> <chr> <chr> <chr> #> 1 John Beatles guitar #> 2 Paul Beatles bass #> 3 Keith <NA> guitarband_members %>% full_join(band_instruments)#>#> # A tibble: 4 x 3 #> name band plays #> <chr> <chr> <chr> #> 1 Mick Stones <NA> #> 2 John Beatles guitar #> 3 Paul Beatles bass #> 4 Keith <NA> guitar# "Filtering" joins keep cases from the LHS band_members %>% semi_join(band_instruments)#>#> # A tibble: 2 x 2 #> name band #> <chr> <chr> #> 1 John Beatles #> 2 Paul Beatlesband_members %>% anti_join(band_instruments)#>#> # A tibble: 1 x 2 #> name band #> <chr> <chr> #> 1 Mick Stones# "Nesting" joins keep cases from the LHS and nests the RHS band_members %>% nest_join(band_instruments)#>#> # A tibble: 3 x 3 #> name band band_instruments #> * <chr> <chr> <list> #> 1 Mick Stones <tibble [0 × 1]> #> 2 John Beatles <tibble [1 × 1]> #> 3 Paul Beatles <tibble [1 × 1]># To suppress the message, supply by band_members %>% inner_join(band_instruments, by = "name")#> # A tibble: 2 x 3 #> name band plays #> <chr> <chr> <chr> #> 1 John Beatles guitar #> 2 Paul Beatles bass# This is good practice in production code # Use a named `by` if the join variables have different names band_members %>% full_join(band_instruments2, by = c("name" = "artist"))#> # A tibble: 4 x 3 #> name band plays #> <chr> <chr> <chr> #> 1 Mick Stones <NA> #> 2 John Beatles guitar #> 3 Paul Beatles bass #> 4 Keith <NA> guitar# Note that only the key from the LHS is kept