These functions read and wrangle excel workbooks produced by the FastField app and output a list of dataframes (xlff_to_list()), .csv files (xlff_to_csv()) or .xlsx files (xlff_to_xl()). Each dataframe or file combines all spreadsheets from a single excel workbook in the input directory. If the input directory has multiple workbooks, the output will be multiple dataframes, multiple .csv files, or multiple .excel files.

xlff_to_csv(dir_in, dir_out = "./", first_census = FALSE)

xlff_to_xl(dir_in, dir_out = "./", first_census = FALSE)

xlff_to_list(dir_in, first_census = FALSE, root_columns = NULL)

Arguments

dir_in

String giving the directory containing the excel workbooks to read from.

dir_out

String giving the directory where to write .csv files to.

first_census

Use TRUE if this is your first census. Use FALSE (default) if this is not your first census but a recensus.

root_columns

String. Lowercase name of column(s) in the root sheet ( e.g. c("date", "team")). This is useful when you data has non-standard columns.

Value

xlff_to_csv() and xlff_to_xl() write a .csv or excel (.xlsx) file per workbook -- combining all spreadsheets. xlff_to_list outputs a list where each dataframes combines all spreadsheeets of a workbook.

Details

This is a rigid function with a very specific goal: To process data from FastField forms. Specifically, this is what this function does:

  • Reads each spreadsheet from each workbook and map it to a dataframe.

  • Lowercases and links the names of each dataframe.

  • Adds any missing key-sheets:

    • For first census: (1) "root", (2) "multi_stems", (3) "secondary_stems", and (4) "single_stems".

    • For recensus: (1) "root", (2) "original_stems", (3) "new_secondary_stems", and (4) "recruits"

  • Dates the data by submission_id (date comes from the spreadsheet root).

  • Lowercases and links the names of each dataframe-variable.

  • Drops fake stems.

  • Output a common data structure of your choice.

Acknowledgment

  • Sabrina Russo helped to make these functions useful with first censuses.

  • David Orwig helped to fix a debug.

Examples

library(fs) library(readr) library(readxl) # NOT A FIRST CENSUS # Path to the folder I want to read excel files from dir_in <- dirname(misc_example("two_files/new_stem_1.xlsx")) dir_in
#> [1] "/home/travis/R/Library/fgeo.misc/extdata/two_files"
# Files I want to read dir(dir_in, pattern = "xlsx")
#> [1] "new_stem_1.xlsx" "new_stem_2.xlsx"
# Path to the folder I want to write .csv files to dir_out <- tempdir() # Output a csv file xlff_to_csv(dir_in, dir_out) # Confirm path_file(dir_ls(dir_out, regexp = "new_stem.*csv$"))
#> [1] "new_stem_1.csv" "new_stem_2.csv"
# Also possible to output excel and a list of dataframe. See next section. # FIRST CENSUS dir_in <- dirname(misc_example("first_census/census.xlsx")) # As a reminder you will get a warning of missing sheets # Output list of dataframes (one per input workbook -- here only one) lst <- xlff_to_list(dir_in, first_census = TRUE) str(lst, give.attr = FALSE)
#> List of 1 #> $ census.xlsx:Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 54 obs. of 17 variables: #> ..$ submission_id : chr [1:54] "c01b37d5-2c30-4086-b4a1-78327dfa0a9a" "efd8e991-2e34-40c8-b503-44def6e542d9" "7997596a-c94e-47b6-ab28-4b0e6a94e9df" "a4e952e6-dcad-4410-aba9-36918fa1fa62" ... #> ..$ start_form_time_stamp: chr [1:54] "2018.05.14 11:55:32 AM" "2018.05.14 01:10:33 PM" "2018.05.14 04:14:23 PM" "2018.05.14 04:38:45 PM" ... #> ..$ end_form_time_stamp : chr [1:54] "2018.05.14 06:03:12 PM" "2018.05.14 06:03:11 PM" "2018.05.14 06:03:11 PM" "2018.05.14 06:03:10 PM" ... #> ..$ sheet : chr [1:54] "form_meta_data" "form_meta_data" "form_meta_data" "form_meta_data" ... #> ..$ quadrat : chr [1:54] NA NA NA NA ... #> ..$ tag : chr [1:54] NA NA NA NA ... #> ..$ stem_tag : chr [1:54] NA NA NA NA ... #> ..$ species : chr [1:54] NA NA NA NA ... #> ..$ species_code : chr [1:54] NA NA NA NA ... #> ..$ dbh : chr [1:54] NA NA NA NA ... #> ..$ status : chr [1:54] NA NA NA NA ... #> ..$ codes : chr [1:54] NA NA NA NA ... #> ..$ notes : chr [1:54] NA NA NA NA ... #> ..$ pom : chr [1:54] NA NA NA NA ... #> ..$ section_id : chr [1:54] NA NA NA NA ... #> ..$ unique_stem : chr [1:54] "NA_NA" "NA_NA" "NA_NA" "NA_NA" ... #> ..$ date : chr [1:54] "2018.05.14" "2018.05.14" "2018.05.14" NA ...
# Output excel xlff_to_xl(dir_in, dir_out, first_census = TRUE) # Read back filename <- path(dir_out, "census.xlsx") out <- read_excel(filename) str(out, give.attr = FALSE)
#> Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 108 obs. of 17 variables: #> $ submission_id : chr "c01b37d5-2c30-4086-b4a1-78327dfa0a9a" "efd8e991-2e34-40c8-b503-44def6e542d9" "7997596a-c94e-47b6-ab28-4b0e6a94e9df" "a4e952e6-dcad-4410-aba9-36918fa1fa62" ... #> $ start_form_time_stamp: chr "2018.05.14 11:55:32 AM" "2018.05.14 01:10:33 PM" "2018.05.14 04:14:23 PM" "2018.05.14 04:38:45 PM" ... #> $ end_form_time_stamp : chr "2018.05.14 06:03:12 PM" "2018.05.14 06:03:11 PM" "2018.05.14 06:03:11 PM" "2018.05.14 06:03:10 PM" ... #> $ sheet : chr "form_meta_data" "form_meta_data" "form_meta_data" "form_meta_data" ... #> $ quadrat : chr NA NA NA NA ... #> $ tag : chr NA NA NA NA ... #> $ stem_tag : chr NA NA NA NA ... #> $ species : chr NA NA NA NA ... #> $ species_code : chr NA NA NA NA ... #> $ dbh : chr NA NA NA NA ... #> $ status : chr NA NA NA NA ... #> $ codes : chr NA NA NA NA ... #> $ notes : chr NA NA NA NA ... #> $ pom : chr NA NA NA NA ... #> $ section_id : chr NA NA NA NA ... #> $ unique_stem : chr "NA_NA" "NA_NA" "NA_NA" "NA_NA" ... #> $ date : chr "2018.05.14" "2018.05.14" "2018.05.14" NA ...