These functions fix common problems of ViewFullTable and ViewTaxonomy data:
Ensure that each column has the correct type.
Ensure that missing values are represented with NA
s -- not with the
literal string "NULL".
A dataframe; either a ForestGEO ViewFullTable
(sanitize_vft()
).
or ViewTaxonomy (sanitize_vft()
).
Character vector of strings to interpret as missing values. Set this
option to character()
to indicate no missing values.
Arguments passed to readr::type_convert()
.
A dataframe.
Thanks to Shameema Jafferjee Esufali for motivating this functions.
assert_is_installed("fgeo.x")
vft <- fgeo.x::vft_4quad
# Introduce problems to show how to fix them
# Bad column types
vft[] <- lapply(vft, as.character)
# Bad representation of missing values
vft$PlotName <- "NULL"
# "NULL" should be replaced by `NA` and `DBH` should be numeric
str(vft[c("PlotName", "DBH")])
#> tibble [500 × 2] (S3: tbl_df/tbl/data.frame)
#> $ PlotName: chr [1:500] "NULL" "NULL" "NULL" "NULL" ...
#> $ DBH : chr [1:500] "30.8" "74" "22.3" NA ...
# Fix
vft_sane <- sanitize_vft(vft)
str(vft_sane[c("PlotName", "DBH")])
#> tibble [500 × 2] (S3: tbl_df/tbl/data.frame)
#> $ PlotName: chr [1:500] NA NA NA NA ...
#> $ DBH : num [1:500] 30.8 74 22.3 NA 33.8 NA NA 16.5 NA 44.6 ...
taxa <- read.csv(fgeo.x::example_path("taxa.csv"))
# E.g. inserting bad column types
taxa[] <- lapply(taxa, as.character)
# E.g. inserting bad representation of missing values
taxa$SubspeciesID <- "NULL"
# "NULL" should be replaced by `NA` and `ViewID` should be integer
str(taxa[c("SubspeciesID", "ViewID")])
#> 'data.frame': 163 obs. of 2 variables:
#> $ SubspeciesID: chr "NULL" "NULL" "NULL" "NULL" ...
#> $ ViewID : chr "1" "2" "3" "4" ...
# Fix
taxa_sane <- sanitize_taxa(taxa)
str(taxa_sane[c("SubspeciesID", "ViewID")])
#> 'data.frame': 163 obs. of 2 variables:
#> $ SubspeciesID: chr NA NA NA NA ...
#> $ ViewID : int 1 2 3 4 5 6 7 8 9 10 ...