Use filter()
to choose rows/cases where conditions are true. Unlike
base subsetting with [
, rows where the condition evaluates to NA
are
dropped.
filter(.data, ..., .preserve = FALSE)
.data | A tbl. All main verbs are S3 generics and provide methods
for |
---|---|
... | Logical predicates defined in terms of the variables in The arguments in |
.preserve | when |
An object of the same class as .data
.
Note that dplyr is not yet smart enough to optimise filtering optimisation
on grouped datasets that don't need grouped calculations. For this reason,
filtering is often considerably faster on ungroup()
ed data.
Because filtering expressions are computed within groups, they may yield different results on grouped tibbles. This will be the case as soon as an aggregating, lagging, or ranking function is involved. Compare this ungrouped filtering:
starwars %>% filter(mass > mean(mass, na.rm = TRUE))
With the grouped equivalent:
starwars %>% group_by(gender) %>% filter(mass > mean(mass, na.rm = TRUE))
The former keeps rows with mass
greater than the global average
whereas the latter keeps rows with mass
greater than the gender
average.
It is valid to use grouping variables in filter expressions.
When applied on a grouped tibble, filter()
automatically rearranges
the tibble by groups for performance reasons.
When applied to a data frame, row names are silently dropped. To preserve,
convert to an explicit variable with tibble::rownames_to_column()
.
The three scoped variants (filter_all()
, filter_if()
and
filter_at()
) make it easy to apply a filtering condition to a
selection of variables.
filter_all()
, filter_if()
and filter_at()
.
Other single table verbs: arrange
,
mutate
, select
,
slice
, summarise
filter(starwars, species == "Human")#> # A tibble: 35 x 13 #> name height mass hair_color skin_color eye_color birth_year gender #> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> #> 1 Luke… 172 77 blond fair blue 19 male #> 2 Dart… 202 136 none white yellow 41.9 male #> 3 Leia… 150 49 brown light brown 19 female #> 4 Owen… 178 120 brown, gr… light blue 52 male #> 5 Beru… 165 75 brown light blue 47 female #> 6 Bigg… 183 84 black light brown 24 male #> 7 Obi-… 182 77 auburn, w… fair blue-gray 57 male #> 8 Anak… 188 84 blond fair blue 41.9 male #> 9 Wilh… 180 NA auburn, g… fair blue 64 male #> 10 Han … 180 80 brown fair brown 29 male #> # … with 25 more rows, and 5 more variables: homeworld <chr>, species <chr>, #> # films <list>, vehicles <list>, starships <list>filter(starwars, mass > 1000)#> # A tibble: 1 x 13 #> name height mass hair_color skin_color eye_color birth_year gender homeworld #> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> #> 1 Jabb… 175 1358 <NA> green-tan… orange 600 herma… Nal Hutta #> # … with 4 more variables: species <chr>, films <list>, vehicles <list>, #> # starships <list># Multiple criteria filter(starwars, hair_color == "none" & eye_color == "black")#> # A tibble: 9 x 13 #> name height mass hair_color skin_color eye_color birth_year gender homeworld #> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> #> 1 Nien… 160 68 none grey black NA male Sullust #> 2 Gasg… 122 NA none white, bl… black NA male Troiken #> 3 Kit … 196 87 none green black NA male Glee Ans… #> 4 Plo … 188 80 none orange black 22 male Dorin #> 5 Lama… 229 88 none grey black NA male Kamino #> 6 Taun… 213 NA none grey black NA female Kamino #> 7 Shaa… 178 57 none red, blue… black NA female Shili #> 8 Tion… 206 80 none grey black NA male Utapau #> 9 BB8 NA NA none none black NA none <NA> #> # … with 4 more variables: species <chr>, films <list>, vehicles <list>, #> # starships <list>filter(starwars, hair_color == "none" | eye_color == "black")#> # A tibble: 38 x 13 #> name height mass hair_color skin_color eye_color birth_year gender #> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> #> 1 Dart… 202 136 none white yellow 41.9 male #> 2 Gree… 173 74 <NA> green black 44 male #> 3 IG-88 200 140 none metal red 15 none #> 4 Bossk 190 113 none green red 53 male #> 5 Lobot 175 79 none light blue 37 male #> 6 Ackb… 180 83 none brown mot… orange 41 male #> 7 Nien… 160 68 none grey black NA male #> 8 Nute… 191 90 none mottled g… red NA male #> 9 Jar … 196 66 none orange orange 52 male #> 10 Roos… 224 82 none grey orange NA male #> # … with 28 more rows, and 5 more variables: homeworld <chr>, species <chr>, #> # films <list>, vehicles <list>, starships <list># Multiple arguments are equivalent to and filter(starwars, hair_color == "none", eye_color == "black")#> # A tibble: 9 x 13 #> name height mass hair_color skin_color eye_color birth_year gender homeworld #> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> #> 1 Nien… 160 68 none grey black NA male Sullust #> 2 Gasg… 122 NA none white, bl… black NA male Troiken #> 3 Kit … 196 87 none green black NA male Glee Ans… #> 4 Plo … 188 80 none orange black 22 male Dorin #> 5 Lama… 229 88 none grey black NA male Kamino #> 6 Taun… 213 NA none grey black NA female Kamino #> 7 Shaa… 178 57 none red, blue… black NA female Shili #> 8 Tion… 206 80 none grey black NA male Utapau #> 9 BB8 NA NA none none black NA none <NA> #> # … with 4 more variables: species <chr>, films <list>, vehicles <list>, #> # starships <list># The filtering operation may yield different results on grouped # tibbles because the expressions are computed within groups. # # The following filters rows where `mass` is greater than the # global average: starwars %>% filter(mass > mean(mass, na.rm = TRUE))#> # A tibble: 10 x 13 #> name height mass hair_color skin_color eye_color birth_year gender #> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> #> 1 Dart… 202 136 none white yellow 41.9 male #> 2 Owen… 178 120 brown, gr… light blue 52 male #> 3 Chew… 228 112 brown unknown blue 200 male #> 4 Jabb… 175 1358 <NA> green-tan… orange 600 herma… #> 5 Jek … 180 110 brown fair blue NA male #> 6 IG-88 200 140 none metal red 15 none #> 7 Bossk 190 113 none green red 53 male #> 8 Dext… 198 102 none brown yellow NA male #> 9 Grie… 216 159 none brown, wh… green, y… NA male #> 10 Tarf… 234 136 brown brown blue NA male #> # … with 5 more variables: homeworld <chr>, species <chr>, films <list>, #> # vehicles <list>, starships <list># Whereas this keeps rows with `mass` greater than the gender # average: starwars %>% group_by(gender) %>% filter(mass > mean(mass, na.rm = TRUE))#> # A tibble: 25 x 13 #> # Groups: gender [3] #> name height mass hair_color skin_color eye_color birth_year gender #> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> #> 1 C-3PO 167 75 <NA> gold yellow 112 <NA> #> 2 Dart… 202 136 none white yellow 41.9 male #> 3 Owen… 178 120 brown, gr… light blue 52 male #> 4 Beru… 165 75 brown light blue 47 female #> 5 Bigg… 183 84 black light brown 24 male #> 6 Anak… 188 84 blond fair blue 41.9 male #> 7 Chew… 228 112 brown unknown blue 200 male #> 8 Jek … 180 110 brown fair blue NA male #> 9 Bossk 190 113 none green red 53 male #> 10 Ackb… 180 83 none brown mot… orange 41 male #> # … with 15 more rows, and 5 more variables: homeworld <chr>, species <chr>, #> # films <list>, vehicles <list>, starships <list># Refer to column names stored as strings with the `.data` pronoun: vars <- c("mass", "height") cond <- c(80, 150) starwars %>% filter( .data[[vars[[1]]]] > cond[[1]], .data[[vars[[2]]]] > cond[[2]] )#> # A tibble: 21 x 13 #> name height mass hair_color skin_color eye_color birth_year gender #> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> #> 1 Dart… 202 136 none white yellow 41.9 male #> 2 Owen… 178 120 brown, gr… light blue 52 male #> 3 Bigg… 183 84 black light brown 24 male #> 4 Anak… 188 84 blond fair blue 41.9 male #> 5 Chew… 228 112 brown unknown blue 200 male #> 6 Jabb… 175 1358 <NA> green-tan… orange 600 herma… #> 7 Jek … 180 110 brown fair blue NA male #> 8 IG-88 200 140 none metal red 15 none #> 9 Bossk 190 113 none green red 53 male #> 10 Ackb… 180 83 none brown mot… orange 41 male #> # … with 11 more rows, and 5 more variables: homeworld <chr>, species <chr>, #> # films <list>, vehicles <list>, starships <list># For more complex cases, knowledge of tidy evaluation and the # unquote operator `!!` is required. See https://tidyeval.tidyverse.org/ # # One useful and simple tidy eval technique is to use `!!` to bypass # the data frame and its columns. Here is how to filter the columns # `mass` and `height` relative to objects of the same names: mass <- 80 height <- 150 filter(starwars, mass > !!mass, height > !!height)#> # A tibble: 21 x 13 #> name height mass hair_color skin_color eye_color birth_year gender #> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> #> 1 Dart… 202 136 none white yellow 41.9 male #> 2 Owen… 178 120 brown, gr… light blue 52 male #> 3 Bigg… 183 84 black light brown 24 male #> 4 Anak… 188 84 blond fair blue 41.9 male #> 5 Chew… 228 112 brown unknown blue 200 male #> 6 Jabb… 175 1358 <NA> green-tan… orange 600 herma… #> 7 Jek … 180 110 brown fair blue NA male #> 8 IG-88 200 140 none metal red 15 none #> 9 Bossk 190 113 none green red 53 male #> 10 Ackb… 180 83 none brown mot… orange 41 male #> # … with 11 more rows, and 5 more variables: homeworld <chr>, species <chr>, #> # films <list>, vehicles <list>, starships <list>