• pick_main_stem() picks a unique row for each treeID per census.

  • pick_main_stemid() picks a unique row for each stemID per census. It is only useful when a single stem was measured twice in the same census, which sometimes happens to correct for the effect of large buttresses.





A ForestGEO-like dataframe: A ViewFullTable, tree or stem table.


A dataframe with a single plotname, and one row per per treeid per censusid.


  • pick_main_stem() picks the main stem of each tree in each census. It collapses data of multi-stem trees by picking a single stem per treeid per censusid. From this group, it picks the stem at the top of a list sorted first by descending order of hom and then by descending order of dbh. This this corrects the effect of buttresses and picks the main stem. It ignores groups of grouped data and rejects data with multiple plots.

  • pick_main_stemid() does one step less than pick_main_stem(). It only picks the main stemid(s) of each tree in each census and keeps all stems per treeid. This is useful when calculating the total basal area of a tree, because you need to sum the basal area of each individual stem as well as sum only one of the potentially multiple measurements of each buttressed stem per census.


These functions may be considerably slow. They are fastest if the data already has a single stem per treeid. They are slower with data containing multiple stems per treeid (per censusid), which is the main reason for using this function. The slowest scenario is when data also contains duplicated values of stemid per treeid (per censusid). This may happen if trees have buttresses, in which case these functions check every stem for potential duplicates and pick the one with the largest hom value.

For example, in a windows computer with 32 GB of RAM, a dataset with 2 million rows with multiple stems and buttresses took about 3 minutes to run. And a dataset with 2 million rows made up entirely of main stems took about ten seconds to run.

See also

Other functions to pick or drop rows of a ForestGEO dataframe: pick_drop


# One `treeID` with multiple stems.
# `stemID == 1.1` has two measurements (due to buttresses).
# `stemID == 1.2` has a single measurement.
# styler: off
census <- tribble(
    ~sp, ~treeID, ~stemID,  ~hom, ~dbh, ~CensusID,
  "sp1",     "1",   "1.1",   140,   40,         1,  # main stemID (max `hom`)
  "sp1",     "1",   "1.1",   130,   60,         1,
  "sp1",     "1",   "1.2",   130,   55,         1   # main stemID (only one)
#' # styler: on

# Picks a unique row per unique `treeID`
#> # A tibble: 1 × 6
#>   sp    treeID stemID   hom   dbh CensusID
#>   <chr> <chr>  <chr>  <dbl> <dbl>    <dbl>
#> 1 sp1   1      1.1      140    40        1

# Picks a unique row per unique `stemID`
#> # A tibble: 2 × 6
#>   sp    treeID stemID   hom   dbh CensusID
#>   <chr> <chr>  <chr>  <dbl> <dbl>    <dbl>
#> 1 sp1   1      1.1      140    40        1
#> 2 sp1   1      1.2      130    55        1