Retain only unique/distinct rows from an input tbl. This is similar
to unique.data.frame()
, but considerably faster.
distinct(.data, ..., .keep_all = FALSE)
.data | a tbl |
---|---|
... | Optional variables to use when determining uniqueness. If there are multiple rows for a given combination of inputs, only the first row will be preserved. If omitted, will use all variables. |
.keep_all | If |
Comparing list columns is not fully supported. Elements in list columns are compared by reference. A warning will be given when trying to include list columns in the computation. This behavior is kept for compatibility reasons and may change in a future version. See examples.
#> [1] 100#> [1] 65#> [1] 65distinct(df, x)#> # A tibble: 10 x 1 #> x #> <int> #> 1 6 #> 2 7 #> 3 10 #> 4 5 #> 5 8 #> 6 2 #> 7 1 #> 8 9 #> 9 4 #> 10 3distinct(df, y)#> # A tibble: 10 x 1 #> y #> <int> #> 1 5 #> 2 1 #> 3 9 #> 4 10 #> 5 4 #> 6 6 #> 7 3 #> 8 8 #> 9 2 #> 10 7# Can choose to keep all other variables as well distinct(df, x, .keep_all = TRUE)#> # A tibble: 10 x 2 #> x y #> <int> <int> #> 1 6 5 #> 2 7 5 #> 3 10 5 #> 4 5 10 #> 5 8 4 #> 6 2 6 #> 7 1 5 #> 8 9 9 #> 9 4 8 #> 10 3 6distinct(df, y, .keep_all = TRUE)#> # A tibble: 10 x 2 #> x y #> <int> <int> #> 1 6 5 #> 2 7 1 #> 3 6 9 #> 4 5 10 #> 5 8 4 #> 6 2 6 #> 7 8 3 #> 8 6 8 #> 9 7 2 #> 10 9 7#> # A tibble: 10 x 1 #> diff #> <int> #> 1 1 #> 2 2 #> 3 6 #> 4 5 #> 5 3 #> 6 4 #> 7 0 #> 8 9 #> 9 8 #> 10 7# The same behaviour applies for grouped data frames # except that the grouping variables are always included df <- tibble( g = c(1, 1, 2, 2), x = c(1, 1, 2, 1) ) %>% group_by(g) df %>% distinct()#> # A tibble: 3 x 2 #> # Groups: g [2] #> g x #> <dbl> <dbl> #> 1 1 1 #> 2 2 2 #> 3 2 1df %>% distinct(x)#> # A tibble: 3 x 2 #> # Groups: g [2] #> x g #> <dbl> <dbl> #> 1 1 1 #> 2 2 2 #> 3 1 2# Values in list columns are compared by reference, this can lead to # surprising results tibble(a = as.list(c(1, 1, 2))) %>% glimpse() %>% distinct()#> Observations: 3 #> Variables: 1 #> $ a <list> [1, 1, 2]#> Warning: distinct() does not fully support columns of type `list`. #> List elements are compared by reference, see ?distinct for details. #> This affects the following columns: #> - `a`#> # A tibble: 3 x 1 #> a #> <list> #> 1 <dbl [1]> #> 2 <dbl [1]> #> 3 <dbl [1]>#> Observations: 3 #> Variables: 1 #> $ a <list> [1, 1, 2]#> Warning: distinct() does not fully support columns of type `list`. #> List elements are compared by reference, see ?distinct for details. #> This affects the following columns: #> - `a`#> # A tibble: 2 x 1 #> a #> <list> #> 1 <int [1]> #> 2 <int [1]>