R/pick_main_stem.R
pick_main_stem.Rd
pick_main_stem()
picks a unique row for each treeID
per census.
pick_main_stemid()
picks a unique row for each stemID
per census. It is
only useful when a single stem was measured twice in the same census, which
sometimes happens to correct for the effect of large buttresses.
pick_main_stem(data)
pick_main_stemid(data)
A ForestGEO-like dataframe: A ViewFullTable, tree or stem table.
A dataframe with a single plotname, and one row per per treeid per censusid.
pick_main_stem()
picks the main stem of each tree in each census. It
collapses data of multi-stem trees by picking a single stem per treeid
per
censusid
. From this group, it picks the stem at the top of a list sorted
first by descending order of hom
and then by descending order of dbh
.
This this corrects the effect of buttresses and picks the main stem. It
ignores groups of grouped data and rejects data with multiple plots.
pick_main_stemid()
does one step less than pick_main_stem()
. It only
picks the main stemid(s) of each tree in each census and keeps all stems per
treeid. This is useful when calculating the total basal area of a tree,
because you need to sum the basal area of each individual stem as well as sum
only one of the potentially multiple measurements of each buttressed stem per
census.
These functions may be considerably slow. They are fastest if the data
already has a single stem per treeid. They are slower with data containing
multiple stems per treeid
(per censusid
), which is the main reason for
using this function. The slowest scenario is when data also contains
duplicated values of stemid
per treeid
(per censusid
). This may
happen if trees have buttresses, in which case these functions check
every stem for potential duplicates and pick the one with the largest hom
value.
For example, in a windows computer with 32 GB of RAM, a dataset with 2 million rows with multiple stems and buttresses took about 3 minutes to run. And a dataset with 2 million rows made up entirely of main stems took about ten seconds to run.
Other functions to pick or drop rows of a ForestGEO dataframe:
pick_drop
# One `treeID` with multiple stems.
# `stemID == 1.1` has two measurements (due to buttresses).
# `stemID == 1.2` has a single measurement.
# styler: off
census <- tribble(
~sp, ~treeID, ~stemID, ~hom, ~dbh, ~CensusID,
"sp1", "1", "1.1", 140, 40, 1, # main stemID (max `hom`)
"sp1", "1", "1.1", 130, 60, 1,
"sp1", "1", "1.2", 130, 55, 1 # main stemID (only one)
)
#' # styler: on
# Picks a unique row per unique `treeID`
pick_main_stem(census)
#> # A tibble: 1 × 6
#> sp treeID stemID hom dbh CensusID
#> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 sp1 1 1.1 140 40 1
# Picks a unique row per unique `stemID`
pick_main_stemid(census)
#> # A tibble: 2 × 6
#> sp treeID stemID hom dbh CensusID
#> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 sp1 1 1.1 140 40 1
#> 2 sp1 1 1.2 130 55 1