We welcome contributions to the prioritizr R package. These contributions could be simple typo fixes, sentence rewrites to improve clarity, enhancing the vignettes to provide a greater understanding this package, or completely new functions to customize conservation planning problems. No contribution is too small.

The aim of this document is to provide a comprehensive guide for making contributions to the prioritizr R package. It also serves to document the internal organization and rationale behind the design of this package. Please note that it is beyond the scope of this document to teach R programming or package development practices. To learn more about these topics, we strongly recommend reading R for Data Science, Advanced R, and R Packages. Additionally, a substantial proportion of the code in this package is written in the C++ programming language using the Rcpp R package. Therefore, individuals wishing to contribute new functionality (e.g. a function to add a new set of constraints to a conservation planning problem) may need to learn C++ programming and the Rcpp R package. To learn more about C++ programming and/or the Rcpp R package, we recommend reading Rcpp for everyone. All of these are books are free, online, and absolutely fantastic resources.

If you have any questions about contributing to prioritizr that are not covered in this document, please post an issue on the GitHub repository.

Style guide

This package uses American spelling for all English words. All code should use snake-case (i.e. under-scores and lower-case spelling; e.g. add_relative_targets), except for class names which use camel-case (i.e. no under-scores and upper-case letters; e.g. ConservationProblem). The <- operator should be used for object assignment. Spaces should surround all infix operators (e.g. +, -, etc.) and should be placed after commas. All indenting should use two-spaces. All lines should not exceed 80 characters in length, except for text in vignettes (and @aliases roxygen2 tags because they must be on a single line). All code should escape character objects using double quotes (i.e. " symbols). Where possible, functions from other packages should not be imported and instead should be referred to using the :: function (e.g. Matrix::sparseMatrix to access the sparseMatrix function from the Matrix package). All functions should use the assertthat package to validate arguments. This package uses the testthat package for unit testing. All functions should have at least one unit test that verifies that it works and ideally several tests that verify that invalidate arguments return an error. All private functions that are not exported via the NAMESPACE should be located in the R/internal.R file. Although this package exports the pipe operator (%>%) for ease of use, for clarity, this operator should not be used anywhere in this package’s source code (excepting documentation and unit tests).

This package uses the roxygen2 package for generating documentation. Thus help files should be generated using roxygen2 comments. Software and R package names should be written in italics in markdown documents and enclosed in \pkg tags in Latex files (e.g. ). When referring to parameter names in the documentation, they should be encapsulated by \code tags (e.g. \code{x} in help for problem). When referring to the names of columns in a data.frame object, please use the format \code{"column_name"}, where column_name is the name of the column (e.g. \code{"mpg"} to refer to a column in the mtcars object). All help files for functions should contain the following sections at a minimum: Description, Value, and Examples.

Prototype class

The main class used to represent objects in prioritizr is the pproto class (defined in R/pproto.R). This class is inspired by the ggproto system used in the ggplot2 package and uses environments to manage data (based on the proto class defined in the proto package). It has several desirable properties: pproto objects can be modified in place, specific elements in a pproto object can be modified without R needing to copy all of the data stored in the pproto object (unlike list and S3 objects in R), a pproto object can be copied without physically needing to copy all of raw data stored within it (specifically, when we copy a pproto object R just copies pointers to the data in the original pproto object). These properties are especially desirable when working with large data sets. However, they come at a cost. Notably, these “features” may be unnecessary for small data sets and so pproto objects may incur unnecessary overheads that mean manipulating them may take slightly longer than otherwise expected. Additionally, the raw syntax for working with pproto objects is object-oriented—though perhaps this may be argued as a feature and not a cost—and so some extra experience may be required to understand how to manipulate them.

Conservation problems

The ConservationProblem class (defined in R/ConservationProblem-proto.R) is used to represent conservation planning objects. They are created by the problem function and should contain all the data and parameters needed to generate a mathematical formulation of the problem. They should also contain all the information needed to instantiate an interactive web application to manipulate them (using the shiny package, though the functionality to achieve this is still being developed).

Conservation modifiers

The ConservationModifier class (defined in R/ConservationModifier-proto.R) is used to instantiate objects that systematically alter the mathematical formulation of a conservation planning problem (i.e. a ConservationProblem object). All of the functions used for specifying objectives, constraints, penalties, decisions, solvers, and portfolios involve creating a new ConservationModifier object and appending it to a ConservationProblem object. To help standardize the behavior of different ConservationModifier objects, we have created specializations of the ConservationModifier class: Objective (defined in R/Objective-proto.R), Constraint (defined in R/Constraint-proto.R), Penalty (defined in R/Penalty-proto.R), Decision (defined in R/Decision-proto.R), Solver (defined in R/Solver-proto.R), and Portfolio (defined in R/Portfolio-proto.R). Every single ConservationModifier object contains an apply method that is used to modify an OptimizationProblem object to “apply” the mathematical formulation described by the object.

Optimization problems

The OptimizationProblem class (defined in R/OptimizationProblem-proto.R) represents the precise mathematical formulation of a conservation planning problem. When a ConservationProblem object is “solved” (using the solve function), the ConservationProblem object is first “compiled” into an OptimizationProblem object (using the compile function) and the OptimizationProblem object is then solved (using the solve function). This distinction is important because if a contributor wants to add a new objective, constraint, penalty, or decision function to prioritizr they will need to write code that adds the mathematical formulation of the function to a OptimizationProblem object. Alas, this is not entirely trivial.

All OptimizationProblem objects are simply a wrapper to an external pointer pointer (XPtr) that points to a C++ OPTIMIZATIONPROBLEM class object (defined in src/optimization_problem.h). We decided to implement the OptimizationProblem class in this manner because (1) storing objects as external pointers reduces memory consumption and (2) the process of adding in constraints requires iteration and loops which are slow in R. We elected not to use the ompr package because (1) benchmarks at the time indicated that this package would take too long to compile problems with many constraints and (2) it uses the R Optimization Infrastructure (ROI) which as yet solve problems using Gurobi (discounting repositories on GitHub which do not appear to be actively maintained). As a consequence, Rcpp code is needed to modify OptimizationProblem objects and users will need to write Rcpp functions to add new objectives, constraints, penalties to prioritizr.

The C++ OPTIMIZATIONPROBLEM class contains the standard data needed to formulate a mixed integer linear programming problem (i.e. model sense, problem objective , constraint matrix, constraint senses, right-hand-side values, variable decision types, and variable bounds). This class also contains additional data that pertain to conservation problems (i.e. number of features, planning units, and zones). If a potential contributor is unfamiliar with the standard representation of a mixed integer linear programming problem, we encourage them to read the documentation for the gurobi R package. The fields A_i, A_j, and A_x correspond the row, column, and cells values for the constraint matrix of the optimization problem (respectively). The other fields follow standard conventions. Note that the constraint matrix is, ultimately, constructed as a Matrix::sparseMatrix so row and column indices do not need to be sequential. Additionally, all Rcpp constraint and penalty functions should be independent. In other words, though it may be computationally efficient to reuse constraints and variables encoded in other functions, each Rcpp constraint and penalty function define its own constraints and variables. All conservation planning problems are defined following one of two standard mathematical formulations: the compressed and the expanded formulation.

The compressed formulation defines a problem which assumes that all instances of a conserved feature are used to achieve the target or count towards the benefit of a solution. Although the expanded formulation can provide identical solutions to the compressed formulation, the compressed formulation is provided because it is simpler and can be used with much larger sized problems. Currently, all constraints use the compressed formulation except for the add_feature_contiguity_constraints function. Under this formulation, the first set of decision variables (the first number of planning units × number of zones) always pertain to the state of the planning units. Thus in a problem containing 3 planning units and 2 zones, the first six variables indicate the allocation of: planning unit 1 in zone 1, planning unit 2 in zone 1, planning unit 3 in zone 1, planning unit 1 in zone 2, planning unit 2 in zone 2, planning unit 3 in zone 2. The first set of constraints (rows) always correspond to each target (noting that some objectives use “fake” targets to initialize the feature by planning unit data, referred to as rij data; see R/compile.R). These rows, which each correspond to a single target, contain the amount of each feature in each zone for which the target pertains. Thus rows for targets which pertain to a single zone will only contain feature abundance data for planning units (columns) in a single zone, and rows for targets which pertain to a single feature in multiple zones will contain feature abundance data for planning units (columns) in multiple zones.

To help illustrate the compressed formulation, consider the following problem:

# simulate data
pu <- data.frame(id = 1:3, cost_1 = 4:6, cost_2 = 7:9)
zone <- data.frame(id = 1:2, name = c("z1", "z2"))
feature <- data.frame(id = 1:2, name = c("f1", "f2"))
rij <- expand.grid(pu = 1:3, species = 1:2, zone = 1:2)
rij$amount <- seq_len(nrow(rij)) + nrow(rij)
targets <- matrix(1, nrow = 2, ncol = 2)

# create problem
p <- problem(pu, feature, rij, c("cost_1", "cost_2"), zone) %>%
     add_absolute_targets(targets)

# print problem
print(p)
## Conservation Problem
##   zones:          z1, z2 (2 zones)
##   planning units: data.frame (3 units)
##   cost:           min: 4, max: 9
##   features:       f1, f2 (2 features)
##   objective:      none
##   targets:        Absolute targets [targets (min: 1, max: 1)]
##   decisions:      default
##   constraints:    <none>
##   penalties:      <none>
##   portfolio:      default
##   solver:         default

The compressed formulation expresses the planning unit and feature data using the following constraint matrix. Here, each variable (column) corresponds to a different planning unit and a different zone allocation, each constraint (row) corresponds to a different target, and each cell corresponds to the amount of a each feature in each planning unit given a different zone (based on the rij data and the targets).

## 4 x 6 sparse Matrix of class "dgCMatrix"
##              pu1_z1 pu2_z1 pu3_z1 pu1_z2 pu2_z2 pu3_z2
## target_f1_z1     13     14     15      .      .      .
## target_f2_z1     16     17     18      .      .      .
## target_f1_z2      .      .      .     19     20     21
## target_f2_z2      .      .      .     22     23     24

The expanded formulation, on the other hand, defines a problem which can allow for some instances of conserved features to not be used for achieving the targets or maximizing the conservation benefit. This formulation is a generalized version of the compressed formulation. It contains additional variables (columns) and constraints (rows) for each combination of feature, planning unit, and zone that indicate if a given planning unit allocated to a specific zone is also allocated to conserve a given feature.

Given the previous problem, the expanded formulation expresses the planning unit and feature data in the constraint matrix as:

# generate targets
targets2 <- p$targets$output()

# create matrix
m <- matrix(NA, ncol = (p$number_of_zones() * p$number_of_planning_units()) +
                       (p$number_of_zones() * p$number_of_planning_units() *
                        p$number_of_features()),
            nrow = (p$number_of_zones() * p$number_of_planning_units() *
                    p$number_of_features()) +
                    (p$number_of_features() * p$number_of_zones()))

# add row names
rownames(m) <- c(paste0("pu", rep(seq_len(p$number_of_planning_units()),
                                            p$number_of_zones() *
                                            p$number_of_features()),
                          "_", rep(rep(p$feature_names(),
                                       each = p$number_of_planning_units()),
                                   p$number_of_zones()),
                          "_", rep(p$zone_names(),
                                   each = p$number_of_planning_units() *
                                          p$number_of_features())),
                 paste0("target_", rep(p$feature_names(), p$number_of_zones()),
                        "_", rep(p$zone_names(),
                                 each = p$number_of_features())))

# add column names
colnames(m) <- c(paste0("pu", rep(seq_len(p$number_of_planning_units()),
                                          p$number_of_zones()),
                        "_", rep(p$zone_names(),
                                 each = p$number_of_planning_units())),
                 paste0("pu", rep(seq_len(p$number_of_planning_units()),
                                          p$number_of_zones() *
                                          p$number_of_features()),
                        "_", rep(rep(p$feature_names(),
                                     each = p$number_of_planning_units()),
                                 p$number_of_zones()),
                        "_", rep(p$zone_names(),
                                 each = p$number_of_planning_units() *
                                        p$number_of_features())))

# add in indicator variables and constraints
curr_row <- 0
for (z in seq_len(p$number_of_zones())) {
  for (i in seq_len(p$number_of_features())) {
    for (j in seq_len(p$number_of_planning_units())) {
      curr_row <- curr_row + 1
      curr_col1 <- ((z - 1) * p$number_of_planning_units()) + j
      curr_col2 <- (p$number_of_planning_units() * p$number_of_zones()) +
                   ((z - 1) * p$number_of_features() *
                              p$number_of_planning_units()) +
                   ((i - 1) * p$number_of_planning_units()) + j
      m[curr_row, curr_col1] <- -1
      m[curr_row, curr_col2] <- 1
    }
  }
}

# add in targets
for (i in seq_len(nrow(targets2))) {
  # extract indices
  curr_row <- curr_row + 1
  curr_feature <- targets2$feature[i]
  curr_zone <- targets2$zone[i][[1]]
  curr_cols <- (p$number_of_planning_units() * p$number_of_zones()) +
               ((curr_zone - 1) * p$number_of_features() *
                                 p$number_of_planning_units()) +
               ((curr_feature - 1) * p$number_of_planning_units()) +
               seq_len(p$number_of_planning_units())
  curr_amount <- rij$amount[unlist(rij$zone) == curr_zone &
                            rij$species == curr_feature]
  # set matrix values
  m[curr_row, curr_cols] <- curr_amount
}

# convert to sparse matrix
m[is.na(m)] <- 0
m <- as(m, "sparseMatrix")

# print matrix
print(m)
## 16 x 18 sparse Matrix of class "dgCMatrix"

##    [[ suppressing 18 column names 'pu1_z1', 'pu2_z1', 'pu3_z1' ... ]]

##                                                                   
## pu1_f1_z1    -1  .  .  .  .  .  1  .  .  .  .  .  .  .  .  .  .  .
## pu2_f1_z1     . -1  .  .  .  .  .  1  .  .  .  .  .  .  .  .  .  .
## pu3_f1_z1     .  . -1  .  .  .  .  .  1  .  .  .  .  .  .  .  .  .
## pu1_f2_z1    -1  .  .  .  .  .  .  .  .  1  .  .  .  .  .  .  .  .
## pu2_f2_z1     . -1  .  .  .  .  .  .  .  .  1  .  .  .  .  .  .  .
## pu3_f2_z1     .  . -1  .  .  .  .  .  .  .  .  1  .  .  .  .  .  .
## pu1_f1_z2     .  .  . -1  .  .  .  .  .  .  .  .  1  .  .  .  .  .
## pu2_f1_z2     .  .  .  . -1  .  .  .  .  .  .  .  .  1  .  .  .  .
## pu3_f1_z2     .  .  .  .  . -1  .  .  .  .  .  .  .  .  1  .  .  .
## pu1_f2_z2     .  .  . -1  .  .  .  .  .  .  .  .  .  .  .  1  .  .
## pu2_f2_z2     .  .  .  . -1  .  .  .  .  .  .  .  .  .  .  .  1  .
## pu3_f2_z2     .  .  .  .  . -1  .  .  .  .  .  .  .  .  .  .  .  1
## target_f1_z1  .  .  .  .  .  . 13 14 15  .  .  .  .  .  .  .  .  .
## target_f2_z1  .  .  .  .  .  .  .  .  . 16 17 18  .  .  .  .  .  .
## target_f1_z2  .  .  .  .  .  .  .  .  .  .  .  . 19 20 21  .  .  .
## target_f2_z2  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 22 23 24