Lump together least/most common factor levels into "other"

fct_lump(f, n, prop, w = NULL, other_level = "Other",
  ties.method = c("min", "average", "first", "last", "random", "max"))

fct_lump_min(f, min, w = NULL, other_level = "Other")

Arguments

f

A factor (or character vector).

n, prop

If both n and prop are missing, fct_lump lumps together the least frequent levels into "other", while ensuring that "other" is still the smallest level. It's particularly useful in conjunction with fct_inorder().

Positive n preserves the most common n values. Negative n preserves the least common -n values. It there are ties, you will get at least abs(n) values.

Positive prop preserves values that appear at least prop of the time. Negative prop preserves values that appear at most -prop of the time.

w

An optional numeric vector giving weights for frequency of each value (not level) in f.

other_level

Value of level used for "other" values. Always placed at end of levels.

ties.method

A character string specifying how ties are treated. See rank() for details.

min

Preserves values that appear at least min number of times.

See also

fct_other() to convert specified levels to other.

Examples

x <- factor(rep(LETTERS[1:9], times = c(40, 10, 5, 27, 1, 1, 1, 1, 1))) x %>% table()
#> . #> A B C D E F G H I #> 40 10 5 27 1 1 1 1 1
x %>% fct_lump() %>% table()
#> . #> A D Other #> 40 27 20
x %>% fct_lump() %>% fct_inorder() %>% table()
#> . #> A Other D #> 40 20 27
x <- factor(letters[rpois(100, 5)]) x
#> [1] d c h f d g h f f d c b b f c h h e h h c f e f d b e c b f e c h d b f e #> [38] e f d c g f g d d e c a b d c g e f d f f g g e c f b g g b g e g i e d g #> [75] g d c d i d c b d c d d f f d e d h d d e f f h f e #> Levels: a b c d e f g h i
#> x #> a b c d e f g h i #> 1 9 13 21 14 19 12 9 2
table(fct_lump(x))
#> #> b c d e f g h Other #> 9 13 21 14 19 12 9 3
# Use positive values to collapse the rarest fct_lump(x, n = 3)
#> [1] d Other Other f d Other Other f f d Other Other #> [13] Other f Other Other Other e Other Other Other f e f #> [25] d Other e Other Other f e Other Other d Other f #> [37] e e f d Other Other f Other d d e Other #> [49] Other Other d Other Other e f d f f Other Other #> [61] e Other f Other Other Other Other Other e Other Other e #> [73] d Other Other d Other d Other d Other Other d Other #> [85] d d f f d e d Other d d e f #> [97] f Other f e #> Levels: d e f Other
fct_lump(x, prop = 0.1)
#> [1] d c Other f d g Other f f d c Other #> [13] Other f c Other Other e Other Other c f e f #> [25] d Other e c Other f e c Other d Other f #> [37] e e f d c g f g d d e c #> [49] Other Other d c g e f d f f g g #> [61] e c f Other g g Other g e g Other e #> [73] d g g d c d Other d c Other d c #> [85] d d f f d e d Other d d e f #> [97] f Other f e #> Levels: c d e f g Other
# Use negative values to collapse the most common fct_lump(x, n = -3)
#> [1] Other Other h Other Other Other h Other Other Other Other b #> [13] b Other Other h h Other h h Other Other Other Other #> [25] Other b Other Other b Other Other Other h Other b Other #> [37] Other Other Other Other Other Other Other Other Other Other Other Other #> [49] a b Other Other Other Other Other Other Other Other Other Other #> [61] Other Other Other b Other Other b Other Other Other i Other #> [73] Other Other Other Other Other Other i Other Other b Other Other #> [85] Other Other Other Other Other Other Other h Other Other Other Other #> [97] Other h Other Other #> Levels: a b h i Other
fct_lump(x, prop = -0.1)
#> [1] Other Other h Other Other Other h Other Other Other Other b #> [13] b Other Other h h Other h h Other Other Other Other #> [25] Other b Other Other b Other Other Other h Other b Other #> [37] Other Other Other Other Other Other Other Other Other Other Other Other #> [49] a b Other Other Other Other Other Other Other Other Other Other #> [61] Other Other Other b Other Other b Other Other Other i Other #> [73] Other Other Other Other Other Other i Other Other b Other Other #> [85] Other Other Other Other Other Other Other h Other Other Other Other #> [97] Other h Other Other #> Levels: a b h i Other
# Use weighted frequencies w <- c(rep(2, 50), rep(1, 50)) fct_lump(x, n = 5, w = w)
#> [1] d c h f d Other h f f d c Other #> [13] Other f c h h e h h c f e f #> [25] d Other e c Other f e c h d Other f #> [37] e e f d c Other f Other d d e c #> [49] Other Other d c Other e f d f f Other Other #> [61] e c f Other Other Other Other Other e Other Other e #> [73] d Other Other d c d Other d c Other d c #> [85] d d f f d e d h d d e f #> [97] f h f e #> Levels: c d e f h Other
# Use ties.method to control how tied factors are collapsed fct_lump(x, n = 6)
#> [1] d c h f d g h f f d c b #> [13] b f c h h e h h c f e f #> [25] d b e c b f e c h d b f #> [37] e e f d c g f g d d e c #> [49] Other b d c g e f d f f g g #> [61] e c f b g g b g e g Other e #> [73] d g g d c d Other d c b d c #> [85] d d f f d e d h d d e f #> [97] f h f e #> Levels: b c d e f g h Other
fct_lump(x, n = 6, ties.method = "max")
#> [1] d c Other f d g Other f f d c Other #> [13] Other f c Other Other e Other Other c f e f #> [25] d Other e c Other f e c Other d Other f #> [37] e e f d c g f g d d e c #> [49] Other Other d c g e f d f f g g #> [61] e c f Other g g Other g e g Other e #> [73] d g g d c d Other d c Other d c #> [85] d d f f d e d Other d d e f #> [97] f Other f e #> Levels: c d e f g Other
x <- factor(letters[rpois(100, 5)]) fct_lump_min(x, min = 10)
#> [1] g d f f e c f e f c d f #> [13] e g e c g f c d c c f e #> [25] c d c Other e Other Other c c f c d #> [37] f c g f e Other d Other c Other e d #> [49] e d Other g Other f e e d Other Other Other #> [61] g Other e Other d f d g f d c Other #> [73] e f d f d c d g c e f c #> [85] f Other Other g d g g f c c d Other #> [97] g d Other d #> Levels: c d e f g Other