R/melt_table.R
melt_table.Rd
For certain non-rectangular data formats, it can be useful to parse the data into a melted format where each row represents a single token.
melt_table()
and melt_table2()
are designed to read the type of textual
data where each column is separated by one (or more) columns of space.
melt_table2()
allows any number of whitespace characters between columns,
and the lines can be of different lengths.
melt_table()
is more strict, each line must be the same length,
and each field is in the same position in every line. It first finds empty
columns and then parses like a fixed width file.
melt_table(file, locale = default_locale(), na = "NA", skip = 0, n_max = Inf, guess_max = min(n_max, 1000), progress = show_progress(), comment = "", skip_empty_rows = FALSE) melt_table2(file, locale = default_locale(), na = "NA", skip = 0, n_max = Inf, progress = show_progress(), comment = "", skip_empty_rows = FALSE)
file | Either a path to a file, a connection, or literal data (either a single string or a raw vector). Files ending in Literal data is most useful for examples and tests. It must contain at least one new line to be recognised as data (instead of a path) or be a vector of greater than length 1. Using a value of |
---|---|
locale | The locale controls defaults that vary from place to place.
The default locale is US-centric (like R), but you can use
|
na | Character vector of strings to interpret as missing values. Set this
option to |
skip | Number of lines to skip before reading data. |
n_max | Maximum number of records to read. |
guess_max | Maximum number of records to use for guessing column types. |
progress | Display a progress bar? By default it will only display
in an interactive session and not while knitting a document. The display
is updated every 50,000 values and will only display if estimated reading
time is 5 seconds or more. The automatic progress bar can be disabled by
setting option |
comment | A string used to identify comments. Any text after the comment characters will be silently ignored. |
skip_empty_rows | Should blank rows be ignored altogether? i.e. If this
option is |
melt_fwf()
to melt fixed width files where each column
is not separated by whitespace. melt_fwf()
is also useful for reading
tabular data with non-standard formatting. read_table()
is the
conventional way to read tabular data from whitespace-separated files.
# One corner from http://www.masseyratings.com/cf/compare.htm massey <- readr_example("massey-rating.txt") cat(read_file(massey))#> UCC PAY LAZ KPK RT COF BIH DII ENG ACU Rank Team Conf #> 1 1 1 1 1 1 1 1 1 1 1 Ohio St B10 #> 2 2 2 2 2 2 2 2 4 2 2 Oregon P12 #> 3 4 3 4 3 4 3 4 2 3 3 Alabama SEC #> 4 3 4 3 4 3 5 3 3 4 4 TCU B12 #> 6 6 6 5 5 7 6 5 6 11 5 Michigan St B10 #> 7 7 7 6 7 6 11 8 7 8 6 Georgia SEC #> 5 5 5 7 6 8 4 6 5 5 7 Florida St ACC #> 8 8 9 9 10 5 7 7 10 7 8 Baylor B12 #> 9 11 8 13 11 11 12 9 14 9 9 Georgia Tech ACC #> 13 10 13 11 8 9 10 11 9 10 10 Mississippi SECmelt_table(massey)#> # A tibble: 143 x 4 #> row col data_type value #> <dbl> <dbl> <chr> <chr> #> 1 1 1 character UCC #> 2 1 2 character PAY #> 3 1 3 character LAZ #> 4 1 4 character KPK #> 5 1 5 character RT #> 6 1 6 character COF #> 7 1 7 character BIH #> 8 1 8 character DII #> 9 1 9 character ENG #> 10 1 10 character ACU #> # … with 133 more rows# Sample of 1978 fuel economy data from # http://www.fueleconomy.gov/feg/epadata/78data.zip epa <- readr_example("epa78.txt") cat(read_file(epa))#> ALFA ROMEO ALFA ROMEO 78010003 #> ALFETTA 03 81 8 74 7 89 9 ALFETTA 78010053 #> SPIDER 2000 01 SPIDER 2000 78010103 #> AMC AMC 78020002 #> GREMLIN 03 79 9 79 9 GREMLIN 78020053 #> PACER 04 89 11 89 11 PACER 78020103 #> PACER WAGON 07 90 26 91 26 PACER WAGON 78020153 #> CONCORD 04 88 12 90 11 90 11 83 16 CONCORD 78020203 #> CONCORD WAGON 07 91 30 91 30 CONCORD WAGON 78020253 #> MATADOR COUPE 05 97 14 97 14 MATADOR COUPE 78020303 #> MATADOR SEDAN 06 110 20 110 20 MATADOR SEDAN 78020353 #> MATADOR WAGON 09 112 50 112 50 MATADOR WAGON 78020403 #> ASTON MARTIN ASTON MARTIN 78040002 #> ASTON MARTIN ASTON MARTIN 78040053 #> AUDI AUDI 78050002 #> FOX 03 84 11 84 11 84 11 FOX 78050053 #> FOX WAGON 07 83 40 83 40 FOX WAGON 78050103 #> 5000 04 90 15 90 15 5000 78050153 #> AVANTI AVANTI 78065002 #> AVANTI II 02 75 8 75 8 AVANTI II 78065053melt_table(epa)#> # A tibble: 240 x 4 #> row col data_type value #> <dbl> <dbl> <chr> <chr> #> 1 1 1 character "ALFA ROMEO" #> 2 1 2 empty "" #> 3 1 3 empty "" #> 4 1 4 empty "" #> 5 1 5 empty "" #> 6 1 6 empty "" #> 7 1 7 empty "" #> 8 1 8 empty "" #> 9 1 9 empty "" #> 10 1 10 empty "" #> # … with 230 more rows