Read and wrangle excel-FastField and output dataframes or .csv/.xlsx files.

These functions read and wrangle excel workbooks produced by the FastField app and output a list of dataframes (xlff_to_list()), .csv files (xlff_to_csv()) or .xlsx files (xlff_to_xl()). Each dataframe or file combines all spreadsheets from a single excel workbook in the input directory. If the input directory has multiple workbooks, the output will be multiple dataframes, multiple .csv files, or multiple .excel files.

xlff_to_csv(dir_in, dir_out = "./", first_census = FALSE)

xlff_to_xl(dir_in, dir_out = "./", first_census = FALSE)

xlff_to_list(dir_in, first_census = FALSE, root_columns = NULL)

Arguments

dir_in	String giving the directory containing the excel workbooks to read from.
dir_out	String giving the directory where to write .csv files to.
first_census	Use `TRUE` if this is your first census. Use `FALSE` (default) if this is not your first census but a recensus.
root_columns	String. Lowercase name of column(s) in the root sheet ( e.g. c("date", "team")). This is useful when you data has non-standard columns.

Value

xlff_to_csv() and xlff_to_xl() write a .csv or excel (.xlsx) file per workbook -- combining all spreadsheets. xlff_to_list outputs a list where each dataframes combines all spreadsheeets of a workbook.

Details

This is a rigid function with a very specific goal: To process data from FastField forms. Specifically, this is what this function does:

Reads each spreadsheet from each workbook and map it to a dataframe.
Lowercases and links the names of each dataframe.
Adds any missing key-sheets:
- For first census: (1) "root", (2) "multi_stems", (3) "secondary_stems", and (4) "single_stems".
- For recensus: (1) "root", (2) "original_stems", (3) "new_secondary_stems", and (4) "recruits"
Dates the data by submission_id (date comes from the spreadsheet root).
Lowercases and links the names of each dataframe-variable.
Drops fake stems.
Output a common data structure of your choice.

Acknowledgment

Sabrina Russo helped to make these functions useful with first censuses.
David Orwig helped to fix a debug.

Examples

library(fs)
library(readr)
library(readxl)

# NOT A FIRST CENSUS
# Path to the folder I want to read excel files from
dir_in <- dirname(misc_example("two_files/new_stem_1.xlsx"))
dir_in
#> [1] "/home/travis/R/Library/fgeo.misc/extdata/two_files"

# Files I want to read
dir(dir_in, pattern = "xlsx")
#> [1] "new_stem_1.xlsx" "new_stem_2.xlsx"

# Path to the folder I want to write .csv files to
dir_out <- tempdir()

# Output a csv file
xlff_to_csv(dir_in, dir_out)

# Confirm
path_file(dir_ls(dir_out, regexp = "new_stem.*csv$"))
#> [1] "new_stem_1.csv" "new_stem_2.csv"

# Also possible to output excel and a list of dataframe. See next section.

# FIRST CENSUS
dir_in <- dirname(misc_example("first_census/census.xlsx"))
# As a reminder you will get a warning of missing sheets
# Output list of dataframes (one per input workbook -- here only one)
lst <- xlff_to_list(dir_in, first_census = TRUE)
str(lst, give.attr = FALSE)
#> List of 1
#>  $ census.xlsx:Classes ‘tbl_df’, ‘tbl’ and 'data.frame':	54 obs. of  17 variables:
#>   ..$ submission_id        : chr [1:54] "c01b37d5-2c30-4086-b4a1-78327dfa0a9a" "efd8e991-2e34-40c8-b503-44def6e542d9" "7997596a-c94e-47b6-ab28-4b0e6a94e9df" "a4e952e6-dcad-4410-aba9-36918fa1fa62" ...
#>   ..$ start_form_time_stamp: chr [1:54] "2018.05.14 11:55:32 AM" "2018.05.14 01:10:33 PM" "2018.05.14 04:14:23 PM" "2018.05.14 04:38:45 PM" ...
#>   ..$ end_form_time_stamp  : chr [1:54] "2018.05.14 06:03:12 PM" "2018.05.14 06:03:11 PM" "2018.05.14 06:03:11 PM" "2018.05.14 06:03:10 PM" ...
#>   ..$ sheet                : chr [1:54] "form_meta_data" "form_meta_data" "form_meta_data" "form_meta_data" ...
#>   ..$ quadrat              : chr [1:54] NA NA NA NA ...
#>   ..$ tag                  : chr [1:54] NA NA NA NA ...
#>   ..$ stem_tag             : chr [1:54] NA NA NA NA ...
#>   ..$ species              : chr [1:54] NA NA NA NA ...
#>   ..$ species_code         : chr [1:54] NA NA NA NA ...
#>   ..$ dbh                  : chr [1:54] NA NA NA NA ...
#>   ..$ status               : chr [1:54] NA NA NA NA ...
#>   ..$ codes                : chr [1:54] NA NA NA NA ...
#>   ..$ notes                : chr [1:54] NA NA NA NA ...
#>   ..$ pom                  : chr [1:54] NA NA NA NA ...
#>   ..$ section_id           : chr [1:54] NA NA NA NA ...
#>   ..$ unique_stem          : chr [1:54] "NA_NA" "NA_NA" "NA_NA" "NA_NA" ...
#>   ..$ date                 : chr [1:54] "2018.05.14" "2018.05.14" "2018.05.14" NA ...

# Output excel
xlff_to_xl(dir_in, dir_out, first_census = TRUE)
# Read back
filename <- path(dir_out, "census.xlsx")
out <- read_excel(filename)
str(out, give.attr = FALSE)
#> Classes ‘tbl_df’, ‘tbl’ and 'data.frame':	108 obs. of  17 variables:
#>  $ submission_id        : chr  "c01b37d5-2c30-4086-b4a1-78327dfa0a9a" "efd8e991-2e34-40c8-b503-44def6e542d9" "7997596a-c94e-47b6-ab28-4b0e6a94e9df" "a4e952e6-dcad-4410-aba9-36918fa1fa62" ...
#>  $ start_form_time_stamp: chr  "2018.05.14 11:55:32 AM" "2018.05.14 01:10:33 PM" "2018.05.14 04:14:23 PM" "2018.05.14 04:38:45 PM" ...
#>  $ end_form_time_stamp  : chr  "2018.05.14 06:03:12 PM" "2018.05.14 06:03:11 PM" "2018.05.14 06:03:11 PM" "2018.05.14 06:03:10 PM" ...
#>  $ sheet                : chr  "form_meta_data" "form_meta_data" "form_meta_data" "form_meta_data" ...
#>  $ quadrat              : chr  NA NA NA NA ...
#>  $ tag                  : chr  NA NA NA NA ...
#>  $ stem_tag             : chr  NA NA NA NA ...
#>  $ species              : chr  NA NA NA NA ...
#>  $ species_code         : chr  NA NA NA NA ...
#>  $ dbh                  : chr  NA NA NA NA ...
#>  $ status               : chr  NA NA NA NA ...
#>  $ codes                : chr  NA NA NA NA ...
#>  $ notes                : chr  NA NA NA NA ...
#>  $ pom                  : chr  NA NA NA NA ...
#>  $ section_id           : chr  NA NA NA NA ...
#>  $ unique_stem          : chr  "NA_NA" "NA_NA" "NA_NA" "NA_NA" ...
#>  $ date                 : chr  "2018.05.14" "2018.05.14" "2018.05.14" NA ...