Match works in the same way as join, but instead of return the combined dataset, it only returns the matching rows from the first dataset. This is particularly useful when you've summarised the data in some way and want to subset the original data by a characteristic of the subset.
match_df(x, y, on = NULL)
x | data frame to subset. |
---|---|
y | data frame defining matching rows. |
on | variables to match on - by default will use all variables common to both data frames. |
a data frame
match_df
shares the same semantics as join
, not
match
:
the match criterion is ==
, not identical
).
it doesn't work for columns that are not atomic vectors
if there are no matches, the row will be omitted'
join
to combine the columns from both x and y
and match
for the base function selecting matching items
# count the occurrences of each id in the baseball dataframe, then get the subset with a freq >25 longterm <- subset(count(baseball, "id"), freq > 25) # longterm # id freq # 30 ansonca01 27 # 48 baineha01 27 # ... # Select only rows from these longterm players from the baseball dataframe # (match would default to match on shared column names, but here was explicitly set "id") bb_longterm <- match_df(baseball, longterm, on="id") bb_longterm[1:5,]#> id year stint team lg g ab r h X2b X3b hr rbi sb cs bb so ibb #> 4 ansonca01 1871 1 RC1 25 120 29 39 11 3 0 16 6 2 2 1 NA #> 121 ansonca01 1872 1 PH1 46 217 60 90 10 7 0 50 6 6 16 3 NA #> 276 ansonca01 1873 1 PH1 52 254 53 101 9 2 0 36 0 2 5 1 NA #> 398 ansonca01 1874 1 PH1 55 259 51 87 8 3 0 37 6 0 4 1 NA #> 525 ansonca01 1875 1 PH1 69 326 84 106 15 3 0 58 11 6 4 2 NA #> hbp sh sf gidp #> 4 NA NA NA NA #> 121 NA NA NA NA #> 276 NA NA NA NA #> 398 NA NA NA NA #> 525 NA NA NA NA