Join, like merge, is designed for the types of problems where you would use a sql join.

join(x, y, by = NULL, type = "left", match = "all")

Arguments

x

data frame

y

data frame

by

character vector of variable names to join by. If omitted, will match on all common variables.

type

type of join: left (default), right, inner or full. See details for more information.

match

how should duplicate ids be matched? Either match just the "first" matching row, or match "all" matching rows. Defaults to "all" for compatibility with merge, but "first" is significantly faster.

Details

The four join types return:

  • inner: only rows with matching keys in both x and y

  • left: all rows in x, adding matching columns from y

  • right: all rows in y, adding matching columns from x

  • full: all rows in x with matching columns in y, then the rows of y that don't match x.

Note that from plyr 1.5, join will (by default) return all matches, not just the first match, as it did previously.

Unlike merge, preserves the order of x no matter what join type is used. If needed, rows from y will be added to the bottom. Join is often faster than merge, although it is somewhat less featureful - it currently offers no way to rename output or merge on different variables in the x and y data frames.

Examples

first <- ddply(baseball, "id", summarise, first = min(year)) system.time(b2 <- merge(baseball, first, by = "id", all.x = TRUE))
#> user system elapsed #> 0.088 0.000 0.089
system.time(b3 <- join(baseball, first, by = "id"))
#> user system elapsed #> 0.036 0.000 0.036
b2 <- arrange(b2, id, year, stint) b3 <- arrange(b3, id, year, stint) stopifnot(all.equal(b2, b3))