R/inout_xlff_to_output.R
xlff_to_output.Rd
These functions read and wrangle excel workbooks produced by the FastField
app and output a list of dataframes (xlff_to_list()
), .csv files
(xlff_to_csv()
) or .xlsx files (xlff_to_xl()
). Each dataframe or file
combines all spreadsheets from a single excel workbook in the input
directory. If the input directory has multiple workbooks, the output will be
multiple dataframes, multiple .csv files, or multiple .excel files.
xlff_to_csv(dir_in, dir_out = "./", first_census = FALSE) xlff_to_xl(dir_in, dir_out = "./", first_census = FALSE) xlff_to_list(dir_in, first_census = FALSE, root_columns = NULL)
dir_in | String giving the directory containing the excel workbooks to read from. |
---|---|
dir_out | String giving the directory where to write .csv files to. |
first_census | Use |
root_columns | String. Lowercase name of column(s) in the root sheet ( e.g. c("date", "team")). This is useful when you data has non-standard columns. |
xlff_to_csv()
and xlff_to_xl()
write a .csv or excel (.xlsx) file
per workbook -- combining all spreadsheets. xlff_to_list
outputs a list
where each dataframes combines all spreadsheeets of a workbook.
This is a rigid function with a very specific goal: To process data from FastField forms. Specifically, this is what this function does:
Reads each spreadsheet from each workbook and map it to a dataframe.
Lowercases and links the names of each dataframe.
Adds any missing key-sheets:
For first census: (1) "root", (2) "multi_stems", (3) "secondary_stems", and (4) "single_stems".
For recensus: (1) "root", (2) "original_stems", (3) "new_secondary_stems", and (4) "recruits"
Dates the data by submission_id
(date
comes from the spreadsheet
root
).
Lowercases and links the names of each dataframe-variable.
Drops fake stems.
Output a common data structure of your choice.
Sabrina Russo helped to make these functions useful with first censuses.
David Orwig helped to fix a debug.
library(fs) library(readr) library(readxl) # NOT A FIRST CENSUS # Path to the folder I want to read excel files from dir_in <- dirname(misc_example("two_files/new_stem_1.xlsx")) dir_in#> [1] "/home/travis/R/Library/fgeo.misc/extdata/two_files"#> [1] "new_stem_1.xlsx" "new_stem_2.xlsx"# Path to the folder I want to write .csv files to dir_out <- tempdir() # Output a csv file xlff_to_csv(dir_in, dir_out) # Confirm path_file(dir_ls(dir_out, regexp = "new_stem.*csv$"))#> [1] "new_stem_1.csv" "new_stem_2.csv"# Also possible to output excel and a list of dataframe. See next section. # FIRST CENSUS dir_in <- dirname(misc_example("first_census/census.xlsx")) # As a reminder you will get a warning of missing sheets # Output list of dataframes (one per input workbook -- here only one) lst <- xlff_to_list(dir_in, first_census = TRUE) str(lst, give.attr = FALSE)#> List of 1 #> $ census.xlsx:Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 54 obs. of 17 variables: #> ..$ submission_id : chr [1:54] "c01b37d5-2c30-4086-b4a1-78327dfa0a9a" "efd8e991-2e34-40c8-b503-44def6e542d9" "7997596a-c94e-47b6-ab28-4b0e6a94e9df" "a4e952e6-dcad-4410-aba9-36918fa1fa62" ... #> ..$ start_form_time_stamp: chr [1:54] "2018.05.14 11:55:32 AM" "2018.05.14 01:10:33 PM" "2018.05.14 04:14:23 PM" "2018.05.14 04:38:45 PM" ... #> ..$ end_form_time_stamp : chr [1:54] "2018.05.14 06:03:12 PM" "2018.05.14 06:03:11 PM" "2018.05.14 06:03:11 PM" "2018.05.14 06:03:10 PM" ... #> ..$ sheet : chr [1:54] "form_meta_data" "form_meta_data" "form_meta_data" "form_meta_data" ... #> ..$ quadrat : chr [1:54] NA NA NA NA ... #> ..$ tag : chr [1:54] NA NA NA NA ... #> ..$ stem_tag : chr [1:54] NA NA NA NA ... #> ..$ species : chr [1:54] NA NA NA NA ... #> ..$ species_code : chr [1:54] NA NA NA NA ... #> ..$ dbh : chr [1:54] NA NA NA NA ... #> ..$ status : chr [1:54] NA NA NA NA ... #> ..$ codes : chr [1:54] NA NA NA NA ... #> ..$ notes : chr [1:54] NA NA NA NA ... #> ..$ pom : chr [1:54] NA NA NA NA ... #> ..$ section_id : chr [1:54] NA NA NA NA ... #> ..$ unique_stem : chr [1:54] "NA_NA" "NA_NA" "NA_NA" "NA_NA" ... #> ..$ date : chr [1:54] "2018.05.14" "2018.05.14" "2018.05.14" NA ...# Output excel xlff_to_xl(dir_in, dir_out, first_census = TRUE) # Read back filename <- path(dir_out, "census.xlsx") out <- read_excel(filename) str(out, give.attr = FALSE)#> Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 108 obs. of 17 variables: #> $ submission_id : chr "c01b37d5-2c30-4086-b4a1-78327dfa0a9a" "efd8e991-2e34-40c8-b503-44def6e542d9" "7997596a-c94e-47b6-ab28-4b0e6a94e9df" "a4e952e6-dcad-4410-aba9-36918fa1fa62" ... #> $ start_form_time_stamp: chr "2018.05.14 11:55:32 AM" "2018.05.14 01:10:33 PM" "2018.05.14 04:14:23 PM" "2018.05.14 04:38:45 PM" ... #> $ end_form_time_stamp : chr "2018.05.14 06:03:12 PM" "2018.05.14 06:03:11 PM" "2018.05.14 06:03:11 PM" "2018.05.14 06:03:10 PM" ... #> $ sheet : chr "form_meta_data" "form_meta_data" "form_meta_data" "form_meta_data" ... #> $ quadrat : chr NA NA NA NA ... #> $ tag : chr NA NA NA NA ... #> $ stem_tag : chr NA NA NA NA ... #> $ species : chr NA NA NA NA ... #> $ species_code : chr NA NA NA NA ... #> $ dbh : chr NA NA NA NA ... #> $ status : chr NA NA NA NA ... #> $ codes : chr NA NA NA NA ... #> $ notes : chr NA NA NA NA ... #> $ pom : chr NA NA NA NA ... #> $ section_id : chr NA NA NA NA ... #> $ unique_stem : chr "NA_NA" "NA_NA" "NA_NA" "NA_NA" ... #> $ date : chr "2018.05.14" "2018.05.14" "2018.05.14" NA ...