These functions fix common problems of ViewFullTable and ViewTaxonomy data:

  • Ensure that each column has the correct type.

  • Ensure that missing values are represented with NAs -- not with the literal string "NULL".

sanitize_vft(.data, na = c("", "NA", "NULL"), ...)

sanitize_taxa(.data, na = c("", "NA", "NULL"), ...)

Arguments

.data

A dataframe; either a ForestGEO ViewFullTable (sanitize_vft()). or ViewTaxonomy (sanitize_vft()).

na

Character vector of strings to interpret as missing values. Set this option to character() to indicate no missing values.

...

Arguments passed to readr::type_convert().

Value

A dataframe.

Acknowledgments

Thanks to Shameema Jafferjee Esufali for motivating this functions.

See also

Examples

assert_is_installed("fgeo.x")

vft <- fgeo.x::vft_4quad

# Introduce problems to show how to fix them
# Bad column types
vft[] <- lapply(vft, as.character)
# Bad representation of missing values
vft$PlotName <- "NULL"

# "NULL" should be replaced by `NA` and `DBH` should be numeric
str(vft[c("PlotName", "DBH")])
#> tibble [500 × 2] (S3: tbl_df/tbl/data.frame)
#>  $ PlotName: chr [1:500] "NULL" "NULL" "NULL" "NULL" ...
#>  $ DBH     : chr [1:500] "30.8" "74" "22.3" NA ...

# Fix
vft_sane <- sanitize_vft(vft)
str(vft_sane[c("PlotName", "DBH")])
#> tibble [500 × 2] (S3: tbl_df/tbl/data.frame)
#>  $ PlotName: chr [1:500] NA NA NA NA ...
#>  $ DBH     : num [1:500] 30.8 74 22.3 NA 33.8 NA NA 16.5 NA 44.6 ...

taxa <- read.csv(fgeo.x::example_path("taxa.csv"))
# E.g. inserting bad column types
taxa[] <- lapply(taxa, as.character)
# E.g. inserting bad representation of missing values
taxa$SubspeciesID <- "NULL"

# "NULL" should be replaced by `NA` and `ViewID` should be integer
str(taxa[c("SubspeciesID", "ViewID")])
#> 'data.frame':	163 obs. of  2 variables:
#>  $ SubspeciesID: chr  "NULL" "NULL" "NULL" "NULL" ...
#>  $ ViewID      : chr  "1" "2" "3" "4" ...

# Fix
taxa_sane <- sanitize_taxa(taxa)
str(taxa_sane[c("SubspeciesID", "ViewID")])
#> 'data.frame':	163 obs. of  2 variables:
#>  $ SubspeciesID: chr  NA NA NA NA ...
#>  $ ViewID      : int  1 2 3 4 5 6 7 8 9 10 ...