Documenting datasets

This section sets conventions to document datasets in the fgeo.data package. This makes it easier to understand the existing datasets and provides some structure to facilitate the expansion of fgeo.data.

Documenting generic datasets

Generic datasets (e.g. ViewFullTable) should have these components:

  • A short nickname for the generic table (e.g. vft). If the name is already short (e.g. stem), use the full name (e.g. stem). These are some conventional generic names and nicknames:
    • ViewFullTable: vft.
    • ViewTaxonomy: taxa.
    • tree or full: tree.
    • stem: stem.
    • tree and stem, collectively: census.
  • A help file generic_description with the following (e.g. ?vft_description):
    • A generic description that applies to any similar dataset from any ForestGEO site.
    • A section @seealso, linking to specific datasets and to the data_dictionary.

Documenting specific datasets

Datasets from a specific site should be named with the format site_generic_info:

  • site should be a single string uniquely identifying your site among all other sites in the ForestGEO network; it may be a well known nickname (e.g. luquillo).
  • generic should be one of the same as in generic_description.

  • info should be a single short string indicating the most prominent characteristic of the specific dataset – commonly referring to the criteria used to subset a larger dataset (e.g. luquillo_stem6_random).

For example, if your data comes from Luquillo, and it is a ViewFullTable containing a random subset of a few trees, a good name for it would be luquillo_vft_random. This format works well with auto-completion in RStudio.

The documentation of site_generic_info should include the following (e.g. ?luquillo_vft_random):

  • A specific description that applies to this particular dataset.
  • A section @seealso, linking to generic_description and to the data_dictionary.
  • A section @examples showing the structure of site_generic_info.