Contributed Paper

Determining Decision Thresholds and Evaluating Indicators when Conservation Status is Measured as a Continuum B. M. CONNORS∗ † AND A. B. COOPER∗ ∗

School of Resource and Environmental Management, Simon Fraser University, Burnaby, BC, Canada, email [email protected] †ESSA Technologies, Vancouver, BC, Canada

Abstract: Categorization of the status of populations, species, and ecosystems underpins most conservation activities. Status is often based on how a system’s current indicator value (e.g., change in abundance) relates to some threshold of conservation concern. Receiver operating characteristic (ROC) curves can be used to quantify the statistical reliability of indicators of conservation status and evaluate trade-offs between correct (true positive) and incorrect (false positive) classifications across a range of decision thresholds. However, ROC curves assume a discrete, binary relationship between an indicator and the conservation status it is meant to track, which is a simplification of the more realistic continuum of conservation status, and may limit the applicability of ROC curves in conservation science. We describe a modified ROC curve that treats conservation status as a continuum rather than a discrete state. We explored the influence of this continuum and typical sources of variation in abundance that can lead to classification errors (i.e., random variation and measurement error) on the true and false positive rates corresponding to varying decision thresholds and the reliability of change in abundance as an indicator of conservation status, respectively. We applied our modified ROC approach to an indicator of endangerment in Pacific salmon (Oncorhynchus nerka) (i.e., percent decline in geometric mean abundance) and an indicator of marine ecosystem structure and function (i.e., detritivore biomass). Failure to treat conservation status as a continuum when choosing thresholds for indicators resulted in the misidentification of trade-offs between true and false positive rates and the overestimation of an indicator’s reliability. We argue for treating conservation status as a continuum when ROC curves are used to evaluate decision thresholds in indicators for the assessment of conservation status.

Keywords: classification error, conservation status, false negative, false positive, indicator, receiver operating characteristic (ROC), signal detection theory, threat classification, uncertainty Determinaci´ on de Umbrales de Decisiones y Evaluaci´ on delos Indicadores cuando se Mide el Estado de de Conservaci´ on como un Continuo

Resumen: La categorizaci´on de los estados de poblaciones, especies y ecosistemas sirve de apoyo para la mayor´ıa de las actividades de conservaci´ on. El estado com´ unmente se basa en c´ omo el valor indicador actual de un sistema (p. ej.: cambio en la abundancia) se relaciona con alg´ un umbral de preocupaci´ on de conservaci´ on. Las curvas de caracter´ıstica operante del receptor (ROC) pueden usarse para cuantificar la veracidad estad´ıstica de los indicadores del estado de conservaci´ on y para evaluar los pros y contras entre clasificaciones correctas (positivo verdadero) e incorrectas (falso positivo) a lo largo de una gama de umbrales de decisi´ on. Sin embargo, las curvas de ROC suponen una relaci´ on binaria discreta entre el indicador y el estado de conservaci´ on que debe rastrear, lo cual es una simplificaci´ on del continuo m´ as realista del estado de conservaci´ on, y muchos limitan la habilidad de aplicar las curvas de ROC en la ciencia de la conservaci´ on. Describimos una curva de ROC modificada que trata al estado de conservaci´ on como un continuo en lugar

Paper submitted January 22, 2013; revised manuscript accepted April 3, 2014.

1626 Conservation Biology, Volume 28, No. 6, 1626–1635  C 2014 Society for Conservation Biology DOI: 10.1111/cobi.12364

Connors & Cooper

1627

de un estado discreto. Exploramos la influencia de este continuo y de fuentes t´ıpicas de variaci´ on en la abundancia que puedan llevar a errores en la clasificaci´ on (p. ej.: variaci´ on al azar y error de medida) de las tasas de positivos falsos y negativos correspondientes a los umbrales variantes de decisi´ on y en la veracidad del cambio en la abundancia como indicador del estado de conservaci´ on, respectivamente. Aplicamos nuestra estrategia modificada de ROC a un indicador de peligro de extinci´ on del salm´ on del Pac´ıfico (Oncorhynchus nerka) (p. ej.: porcentaje de declinaci´ on en el promedio geom´etrico de abundancia) y a un indicador de la funci´ on y estructura de un ecosistema marino (p. ej.: biomasa detrit´ıvora). Equivocarse al momento de tratar al estado de conservaci´ on como un continuo cuando se escogen umbrales para indicadores result´ o en una mala identificaci´ on de los pros y contras entre las tasas de positivos falsos y verdaderos y la sobreestimaci´ on de la veracidad de un indicador. Argumentamos que se debe tratar al estado de conservaci´ on como un continuo cuando las curvas de ROC se usan para evaluar los umbrales de decisi´ on en los indicadores para la evaluaci´ on del estado de conservaci´ on.

Palabras Clave: caracter´ıstica operante del receptor (ROC), clasificaci´on de amenazas, error de clasificaci´on, estado de conservaci´ on, falso negativo, falso positivo, incertidumbre, indicador, teor´ıa de detecci´ on de se˜ nales

Introduction Classification of the status of populations, species, and ecosystems is the foundation upon which many management and conservation decisions are made (e.g., Rodrigues et al. 2006; Hoffmann et al. 2010; Davies & Baum 2012). Status (e.g., level of threat or conservation concern) is typically quantified based on how the value of a system’s indicator compares with some threshold in metrics such as abundance (IUCN 2008), spatial distribution (Peacock & Holt 2012), and mortality rates (Hisano et al. 2011). Indicators, which are meant to condense vital information into a reliable signal for management, can be used to track the status of a system so that decision makers can make management decisions, like the establishment of a rebuilding plan, based on whether or not the system exceeds a threshold in one or more indicators (e.g., Rosenberg et al. 2006). Though the reliability of indicators is often assumed, the rigorous evaluation of any classification method should be a prerequisite to its widespread use (e.g., Dulvy et al. 2006; Rice & Legace 2007; Porszt et al. 2012). The evaluation of indicators for decision making involves three separate components: the indicator (estimated rate of decline, population size, range size, etc.), the true state of the system (actual rate of decline, population size or range size), and the conservation status of the system (degree of threat to or concern for the system based on the true state). Uncertainty in how well an indicator represents the state of a system can arise from a poor response of the indicator to changes in the system’s state, observation error in the indicator (i.e., the difference between the true and measured indicator values also referred to as measurement error), process noise in the system (i.e., random variation), or a mismatch in the spatial or temporal scales of the indicator compared with the dynamics of the system. Indicators in a conservation context should ideally reflect the underlying conservation status of a system, change when the system’s state changes, and have reliable decision thresholds that can trigger

management actions. As our reliance upon indicators for conservation and management decisions grows, the need to quantitatively assess their performance, identify decision thresholds that optimally trade off errors (acting when not necessary or not acting when one should have), and compare proposed indicators with alternative ones has become increasingly important. Managers are often required to make dichotomous decisions (e.g., to list or not list, to close a hunting or fishing season or not, to institute a rebuilding plan or not). Such binary decisions have necessarily emerged from the need for pre-agreed decision rules given the challenges of making decisions in their absence. Many such decisions are based on one or more indicators passing a specific threshold (e.g., population decrease by certain percentage, population size above or below a certain level); the idea being that when the indicator is above or below the threshold, the state of the system truly warrants conservation action or it does not. There are 4 possible outcomes of such resource management or conservation decisions

Figure 1. Four possible outcomes of a binary classification scheme based on an indicator of conservation status (e.g., trends in abundance). The terms α, β, and 1 − β are equivalent to type I error, type II error, and power in traditional hypotheses testing.

Conservation Biology Volume 28, No. 6, 2014

1628

Figure 2. A receiver operating characteristic (ROC) curve based on observed trends in abundance as an indicator of conservation status (i.e., degree of conservation concern). In this situation high true positive rates are achieved with relatively minimal increases in the false positive rate. The point on the curve is the optimal decision threshold (true positive rate is maximized and false positive rate is minimized) under the assumption that true and false positives are weighted equally. The high value for area under the curve (AUC) indicates this indicator has high discriminatory ability.

(Fig. 1). For example, if classifying the conservation need of a population based on whether or not an indicator (e.g., trend in abundance) threshold has been exceeded, a true positive occurs when conservation action is truly warranted and the indicator suggests the need for conservation action. A true negative occurs when a population does not require conservation intervention and the indicator agrees. A false positive occurs when a population does not require conservation intervention but the indicator suggests it does, and a false negative occurs when a population is truly in need of conservation action but the indicator suggests that conservation action is not needed. The ideal indicator will have few or no false negatives and positives across a range of possible threshold values in the indicator that trigger a conservation action (i.e., decision thresholds). The outcomes of a classification scheme can be summarized in a receiver operating characteristic (ROC) curve, which combines the probabilities of true and false positives and true and false negatives for an indicator across a range of decision thresholds (Vida 1993) (Fig. 2). Details of the derivation of an ROC curve are in the Supporting Information. The points on an ROC curve represent different decision thresholds for the indicator (e.g., 10%, 20%, or 30% decline in observed abundance over 10 years) and their corresponding true and false positives rates for classifying the status of a system (e.g., the probability of

Conservation Biology Volume 28, No. 6, 2014

Indicators of Conservation Status

being of conservation concern) (Fig. 2). By illustrating the trade-offs between true and false positive rates for a given indicator, ROC curves can be used to identify the decision threshold that optimally trade offs correctly classifying a system as of concern (true positives) versus incorrectly classifying it as of concern (false positives). Here we define optimal as the point on the ROC curve that minimizes these errors that can be visualized on the curve as the point closest to the upper-left hand corner. In addition to quantifying and summarizing the performance of an indicator across a range of decision thresholds, the area under the ROC curve (hereafter AUC) can be used to provide a single measure that summarizes the reliability of an indicator across all possible decision thresholds (Vida 1993). Indicators with an AUC close to 1 have very high discriminatory ability, whereas indicators with AUC close to 0.5 are no better than chance at classifying a system correctly. The ROC approach has the advantage of providing more insight into the reliability of an indicator than a single possible classification outcome for a single or limited range of decision thresholds (e.g., Dulvy et al. 2005; Rice & Legace 2007; Porszt et al. 2012). The limitation of the traditional ROC approach for conservation indicators is that it assumes a perfect relationship between the true state of the system and level of conservation concern about that system, both of which are treated as discrete and binary. We illustrate how this can be conceptually generalized with an example in which we used estimated change in abundance as an indicator of the conservation status of a population. The traditional ROC approach can determine whether an estimated change in abundance is a reliable indicator for a true change in abundance, but one must define a priori a single threshold level of true decline above which a system is of conservation concern and below which it is not of concern in order to assess the true positive and false positive rates (Fig. 3a). The indicator imperfectly tracks the true state of the system, but the binary state maps perfectly onto a binary definition of status (is concern warranted or not). Most people would generally agree that a population is of conservation concern if it experiences a 99% decline in abundance over say 3 generations. Likewise, if a population is rapidly increasing in abundance, there is a low probability of its being of conservation concern. At these two extremes one may be reasonably certain of the conservation status of a population; the uncertainty lies between these extremes and in what magnitude of change in true abundance is ecologically significant. What is the probability that a population is of conservation concern if it has truly decreased by 10% or 25% or 50%? The shape of the relationship between conservation status (degree of concern) and state (true, unobserved, amount of decline in abundance here) may take various forms depending on

1629

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Pr (conservation concern)

Connors & Cooper

(b)

(a)

empirically applied our approach to 2 examples cases to evaluate decision thresholds and indicator reliability across a range of potential status-state sensitivity.

Methods Incorporating Status-State Sensitivity in ROC Curves

(c)

(d)

The relationship between the state of the system being tracked by the indicator and the conservation status of the system can be modeled by a logistic function: 1 , (1) 1 + e−c(u−x) where y is the probability of being of conservation concern, x is the true state of the system, μ is the inflection point in the relationship (where there is a 50% chance of being of conservation concern), and c describes how dramatically the probability of being of conservation concern changes with each unit change in the true state of the system (x); higher values of c result in larger changes. We refer to c as the sensitivity of the conservation status to the state of the system, and as the value of c (i.e., sensitivity) increases the dichotomy of the definition of conservation status increases. We used this functional form because a typical ROC curve is a special case of this relationship, where c is very large and, for example, the relationship between percent change in abundance and conservation status is binary (Fig. 3a). Other functional forms may be equally applicable. As c becomes smaller, the relationship changes such that one is increasingly less certain that a population is of conservation concern at any given change in abundance (Fig. 3a–d). We discuss how to choose the functional form and values for c and μ. Assuming a logistic function to describe the relationships between status and state is similar to modeling the relationships between observed change and true change as a bivariate normal distribution (Hand et al. 1998). We incorporated conservation status as a continuum in an ROC curve by modifying the probability of each classification outcome (i.e., true and false positives and negatives), which are typically treated as zeros and ones. For example, in Fig. 4, if a population has truly declined by 35%, then there can be a 100%, 76%, or 54% chance the population is of conservation concern depending on the relationship between true change in abundance and the probability the population is of conservation concern (Fig. 4a–c). If a decision threshold is set at a 25% decline in the indicator (observed change in abundance) and the observed decline is 30% then the indicator would classify the population as being of conservation concern, but the probability of actually being of concern depends on the shape of the curve and is not necessarily equal to one or zero as in the typical ROC approach (Fig. 4b & 4c vs. Fig. 4a). The resulting modified probability of y=

-1.0

-0.5

0.0

0.5

1.0 -1.0

-0.5

0.0

0.5

1.0

Percent change in abundance

Figure 3. Four hypothetical relationships between true percent change in abundance (the state of the system being tracked by the indicator) and the conservation status of a population (probability [Pr] of being of conservation concern; status). The relationships illustrate different levels of sensitivity of conservation status to the system’s state as shown by the equation in (a), where c describes how the probability of being of conservation concern changes with each unit change in the true state of the system, μ is the inflection point in the relationship, and x is the true percent change in abundance in the population: (a) c = 1500, (b) c = 15, (c) c = 5, and (d) c = 2. the population, species, or ecosystem in question and the indicator being used (e.g., Fig. 3a–d). The consequences of conservation status lying along a continuum rather than consisting of discrete categories when ROC curves are used to choose and assess indicator thresholds have not been evaluated previously. Here we describe a modified ROC curve based on true and false positive rates that integrates the reliability of an indicator to track the state of a system (e.g., truly declining by a certain amount) and the continuous relationship between the state of the system and its conservation status (degree of concern). We explored the sensitivity of a system’s conservation status to its state (what we refer to as status-state sensitivity), as well as to other typical sources of variability that can lead to classification errors (i.e., process noise and observation error) in the context of identifying decision thresholds that optimize the trade-off between true and false positive rates and the reliability of change in abundance as an indicator of conservation status in a ROC framework. Finally, we

Conservation Biology Volume 28, No. 6, 2014

1630

Indicators of Conservation Status

absence of an a priori basis for the relationship, one could perform a sensitivity analysis by using multiple functional forms to assess the degree to which the choice of functional form affects one’s choice of a decision threshold for a given indicator or of a given indicator from a suite of potential indicators. Simulations of the Influence of Status-State Sensitivity on ROC Curves We used simulations to explore how the functional form relating a system’s state to it’s conservation status influences estimated true and false positive rates across a range of decision thresholds and the reliability of estimated change in abundance as an indicator of conservation status. We generated time series of abundance according to a stochastic Gompertz model: loge (X t+1 ) = λ + b(loge (X t )) + εt, loge (Yt ) = loge (X t ) + δt ,

Figure 4. Relationships between the state of the system (true percent change in abundance) and probability of conservation concern (conservation status): (a) conservation status highly sensitive to state of system, (b) conservation status moderately sensitive to state of system, and (c) conservation status highly insensitive to state of system (dashed line, true state of the system [truly declined by 30%]). Panels on the right show corresponding true and false positive probabilities when a >25% decline of a population is observed over 10 years with a decision threshold of 25% decline in abundance over 10 years to classify the system as of conservation concern (i.e., a true positive classification).

being of conservation concern now explicitly incorporates the sensitivity of conservation status to the true state of the system by allowing the conservation status to fall along a continuum based on the true state of the system. These modified probabilities can then be used to estimate true and false positives rates across a range of decision thresholds. Determining the functional form relating the true state of the system to the conservation status of the system could be accomplished through solicitation of expert opinion (e.g., Murawski 2000), use of empirically derived relationships and comparative analyses (e.g., Fogarty & Murawski 1998), or use of model simulations (e.g., Fulton et al. 2005; Jennings 2005). In many instances, the shape of the relationship will be unknown. So, in the complete

Conservation Biology Volume 28, No. 6, 2014

(2)

where Xt is the number of individuals at time t, λ is the change in abundance per time step, b is the influence of density dependence (i.e., when b is equal to 1 dynamics are density independent), ε is process noise, Yt is the observed number of individuals at time t, and δ is observation error. Both process noise and observation error are assumed to be independently and identically normally distributed with means of zero and standard deviations that are specified below. We generated 10,000 time series of abundance, each of which was 10 time steps long, according to Eq. 2. Starting abundance was 1000 for each of 144 combinations of ε (0 to 0.5 in increments of 0.1), δ (0 to 0.5 in increments of 0.1), c (2000, 15, 5, and 2), and μ (50% decline over 10 years), which together define the relationship between the true state and the conservation status (Fig. 3). Each time series was density independent (i.e., b = 1) with a true change in loge abundance (λ) randomly drawn from between −0.1 and 0.1 (which produced changes in abundance over the 10 time steps between a 100% increase in abundance and a 100% decrease in abundance). Each of the resulting 1.44 million time series was fit using linear regression to estimate the observed trend in abundance (i.e., slope), which could be compared to the true underling time trend (i.e., λ in equation 2) to classify the status of the population. To generate an ROC curve we explored decision thresholds for an observed change in abundance that ranged from 100% decline to 100% increase in abundance over 10 years in 5% increments. For each decision threshold and each time series, we calculated the probability of a false positive and false negative and true positive and true negative based on the given values for c and μ (Eq. 1). We summed these probabilities across the 10,000

Connors & Cooper

1631

iterations to calculate the true and false positive rates. These rates were then plotted against each other across the range of decision thresholds to generate an ROC curve for a given combination of ε, δ, c, and μ. Finally, we used the trapezoidal method to calculate the AUC of each ROC curve (Vida 1993).

Examples of the Influence of Status-State Sensitivity on ROC Curves To further illustrate the application of our approach, we quantified the influence of status-state sensitivity on two previously published evaluations of indicator performance in conservation science. We used ROC curves to quantify the reliability of an indicator of the conservation status of Fraser River sockeye salmon (Oncorhynchus nerka) (Porszt et al. 2012). The original analyses quantified the reliability 20 possible indicators based on the area under the ROC curve assuming high status-state sensitivity (i.e., c = 1500). We focused on the top indicator (i.e., largest AUC) from these analyses: percent decline between the geometric mean adult abundance in the first 4-year generation and the geometric mean abundance in a current status-assessment generation derived from loge transformed abundances smoothed with a 4-year moving average. In this example, we based true conservation status on whether the percent decline in generations subsequent to the calculation of the indicator value was greater than or less than 30% (i.e., μ in Eq. 1) equaled a 30% decline. For our second example, we examined 1 of the 27 indicators of marine ecosystem structure and function whose performance was quantified by Samhouri et al. (2009). The original analyses quantified indictor performance by simulating perturbations to marine ecosystem models and then tracking the correlation (Spearman rank) between candidate ecosystem indicators and 22 potential ecosystem attributes. We focused on detritivore biomass (tons per square kilometer; a candidate marine ecosystem indicator) as an indicator of mean trophic level (a commonly used marine ecosystem attribute) in the Northern California Current ecosystem. We based conservation status on the observed mean trophic level relative to the baseline mean trophic level in the absence of fishing. True conservation status was therefore based on whether the observed mean trophic level was above or below the baseline value of 0.88 t/km2 (i.e., μ in Eq. 1) equaled 0.88. For both candidate indicators, we generated ROC curves across a range of assumptions about status-state sensitivity (Eq. 1) to explore how changes in the functional form relating a system’s state to its conservation status influenced the identification of the optimal decision thresholds and the overall reliability of each indicator.

Figure 5. Receiver operating characteristic curves under different degrees of conservation status-state sensitivity. Status-state sensitivity decreases from black to light grey lines. Shaded points correspond to lines and are the decision thresholds that optimally trade off correct classification of a system as of concern (true positives) versus incorrectly classifying it as of concern (false positives). The black point on each line is the optimal decision threshold, assuming a dichotomous relationship between the system’s conservation status and its state (i.e., c = 1500).

Results Simulations As conservation status became less sensitive to the system’s state (i.e., lower values of c) for a given decision threshold (i.e., point on an ROC curve such as a 95% decline), the probability of correctly classifying a population as of conservation concern (true positive) declined and the probability of incorrectly classifying a population as of conservation concern (false positive) increased (Fig. 5). For a given decision threshold the decline in the true positive rate was greater than the increase in the false positive rate (Fig. 5), which highlights that it is the ability of an indicator to correctly classify a population as of conservation concern that is most affected by the sensitivity of the conservation status to the system’s state. The decision threshold that optimizes the trade-off between true and false positive rates depended on the assumed functional form that related conservation status to the state of the system (Fig. 5, gray dots). An important consequence of this is that if decision thresholds for an indicator are chosen assuming a dichotomous conservation status when a continuum is more appropriate, then managers will take action less often in situations when they actually should have (i.e., reduced true positive rate) and more often in situations when they should not have (i.e., increased false positive rate; illustrated by increasingly suboptimal decision thresholds in Fig. 5). In addition, the

Conservation Biology Volume 28, No. 6, 2014

Indicators of Conservation Status

1632

reliability of using change in abundance as an indicator of population health (reflected by the AUC value for the ROC curve) was inversely related to the sensitivity of conservation status to the actual rate of decline (different slopes exemplified by Figure 3). The influence of decreasing sensitivity of the conservation status to the system’s state on the true and false positive rates was greatest when other sources of variability (i.e., process noise and observation error) were low and became weaker as these sources of variability increased (Supporting Information). The same patterns occurred with AUC; as conservation status’ sensitivity decreased, AUC declined by as much as approximately 20% at low process noise and observation error and by approximately 10% when process noise and observation error were high (Supporting Information). Empirical Examples The ROC curves of the two example indicators we examined varied considerably in their response to status-state sensitivity (Fig. 6). For Fraser River salmon, the indicator’s reliability (AUC of 0.75–0.67 across the range of status-state sensitivity) and decision thresholds (Fig. 6a) were relatively robust to uncertainty in the degree of status-state uncertainty. As status state-sensitivity decline from 1500 to 5, the optimal decision threshold did not change and the corresponding true and false positive rates changed very little (shaded points in Fig. 6a). However, at the extreme end of the status-state sensitivity spectrum (c = 2), the performance of the decision threshold, assuming a binary status-state relationship (i.e., the black point on the c = 2 line in Fig. 6), declined by approximately 20% from 0.58 to 0.46 false positive rate. From the perspective of a sensitivity analysis, the choice of a decision threshold of a 0% decline was therefore quite robust to status-state sensitivity for all but extreme low values of c. The reliability of detritivore biomass as an indicator of marine ecosystem structure and function was considerably more responsive to uncertainty in status-state sensitivity (AUC of 0.96–0.73 across the range of statusstate sensitivity) as were the decision thresholds that were based on the assumption there was high statusstate sensitivity (Fig. 6b). In this case even small changes in status-state sensitivity (i.e., from c = 1500 to 15) resulted in a considerable reduction in the performance of the decision threshold, assuming a binary status-state relationship. The true positive rate fell by approximately 23% (from 0.82 to 0.63). At larger changes in status-state sensitivity, the decline in the performance of the decision threshold, assuming a binary status-state relationship, was even greater (approximately 50% from c = 1500 to 2). Therefore, from the perspective of a sensitivity analysis, the choice of an optimal decision threshold depended greatly on the status-state relationship. Though the opti-

Conservation Biology Volume 28, No. 6, 2014

mal thresholds differed by only 4% at the extremes (i.e., 56.6 t/km2 vs. 58.9 t/km2 ), the range of observed detritivore biomasses was only 51.5–63.5 t/km2 , so it could be expected that the performance of the indicator would be quite sensitive to apparently small changes in the decision threshold. Under a decision threshold based on a minimum false positive rate (e.g., a minimum of 80%), rather than a decision threshold based on minimizing classification errors, the decision thresholds would be greatly reduced (some of the previously optimal decisions would no longer be acceptable) and the false positive rate would be 0.2–0.7 as status-state sensitivity decreased from 1500 to 2. There were 3 key findings from our empirical sensitivity analyses. First, not only did the performance of the originally chosen decision threshold change with status-state sensitivity, but the actual value of the decision threshold that optimized the error rates sometimes changed. Second, depending on the indicator and the influence of status-state sensitivity on its performance, the indicator sometimes had a much lower true positive rate than expected assuming a traditional binary relationship between status and state. Third, increased true positive rates were accompanied by an increased false positive rate, and the magnitude of this increase in the false positive rate depended on the responsiveness of the indictor to status-state sensitivity.

Discussion The success of conservation rests in large part on the reliability of the information used to make decisions, yet this reliability is not often quantified. Receiver operating characteristic curves, which integrate information on the statistical reliability of an indicator across a range of decision thresholds, offer a promising tool to add to the manager’s toolbox to identify decision thresholds that optimally trade off the probability of taking action when it is warranted (true positive) and when it is not warranted (false positive) and quantify the reliability of indicators. A limitation of the ROC approach, that perhaps has precluded it’s more widespread use in conservation science, is that it assumes a dichotomous relationship between the state of the system and its conservation status. We found that treating conservation status as dichotomous when a continuum is more appropriate can result in the identification of suboptimal decisions thresholds, which leads to taking action less often when it is needed and more often when it is not needed and to the overestimation of the reliability of an indicator when ROC curves and AUC are used. Our modified ROC approach helps overcome this limitation by explicitly incorporating a flexible functional relationship between the system’s state and its conservation status into the calculation of the true and false positive rates that make up an ROC curve, thereby

Connors & Cooper

1633

Figure 6. Receiver operating characteristic curves for 2 candidate indicators of conservation status across a range of status-state sensitivity: (a) conservation status (or degree of endangerment) of Fraser River sockeye salmon conservation units based on the change in the geometric mean abundance from the beginning of the time series to the current status assessment (Porszt et al. 2012) and (b) conservation status of a marine ecosystem (as indexed by mean trophic level) based on detritivore biomass in the ecosystem (Samhouri et al. 2009). The lighter the line shading, the less status-state sensitivity. The black point on each line is the optimal decision threshold, assuming a dichotomous relationship between the system’s conservation status and its state (i.e., c = 1500, thick black line). The lighter grey points correspond to the optimal decision thresholds under each level of status-state sensitivity. Indicator values corresponding to the optimal decision threshold are indicated in the key in each plot. broadening the possible application of ROC curves to the evaluation and interpretation of indictors of conservation status. Our empirical evaluations of decision thresholds across a range of state-state sensitivity illustrate that there is no objectively best answer to the question of what decision threshold to use. However, exploring these sensitivities will force managers faced with decisions about the use of indicators and decision thresholds to think about the ways their decision process might go wrong, the classification errors that might be incurred, and to explicitly explore those possibilities. Therefore, evaluations like the ones we describe with the empirical examples should be considered a priori as part of evaluating indicators and choosing decision thresholds prior to their use. We recognize that the definition of optimal is inherently subjective as is balancing errors associated with a given decision threshold. Exploring the uncertainty in the relationship as we did will allow decision makers greater insight into how those classification errors may arise. In real-world situations, where the true shape of the status-state relationship (Eq 1). is unknown, decision makers could use the results of, for example, Fig. 5 to explore the sensitivity of a given decision threshold value to uncertainty in the status-state relationship. If the goal were to minimize classification errors, then the decision threshold would be set somewhere between a 95% and

65% observed decline in abundance. However, managers may have other goals such as keeping the true positive rate above a certain value or keeping the false positive rate below a certain value. If the goal were, say, to keep the true positive rate above 75% and the false positive rate below 20%, then choosing a decision threshold of 95% would be satisfactory under all but the most extreme low value for c, even though it would not be optimal under the goal of minimizing errors. Our simulation results indicated that treating conservation status as dichotomous when it may fall along a continuum in analyses of the performance of indicators based on ROC curves is likely to result in the identification of suboptimal decisions thresholds and taking action less often when one should take action and more often one we should not take action. Failure to incorporate conservation status as falling along a continuum can also lead to the overestimation of indicator reliability. Importantly, the degree to which this occurs depends in part on the magnitude of other sources of uncertainty, which influence the accuracy of the measurement of the indicator being used. Although we have advanced the applicability of ROC curves to conservation contexts, some additional limitations of ROC curves highlight further opportunities to refine their application. First, in our example optimal thresholds and use of AUC to compare indicators, we

Conservation Biology Volume 28, No. 6, 2014

Indicators of Conservation Status

1634

assumed that omission and commission errors are equally important (Lobo et al. 2008). We recognize that depending on the context in which decisions are made the costs of different classification errors may not be equal. For example, the ecological cost of not closing a fishery when it should be closed (a false negative) may (or may not) be greater than the economic cost of closing a fishery when it should be left open (a false positive). In addition, risk tolerances to different types of error may differ among stakeholders in a decision-making process. These considerations will need to be taken into account when applying our approach to cases where classification errors are not considered equal. One way to account for this is to modify the axes of the ROC curve, but one must explicitly state the weighting scheme to be used. This would not change our general conclusions about the effect of treating conservation status as lying along a continuum. Second, we summarized indicator performance over regions of ROC space in which one might not typically operate (Lobo et al. 2008), such as looking at decision thresholds for declines in abundance greater than zero. Here again, this can be modified by explicitly delineating the region of concern, which would not alter our general conclusions. Finally, the use of area under the ROC curve as a measure of indictor performance has been questioned in some fields (e.g., pattern recognition and medicine) because the AUC of one indicator may not be directly comparable to another (Hand & Anagnostopoulos 2013). Therefore, future analyses should explore alternative approaches to summarizing the performance of ROC curves when contrasting the performance of indicators in a conservation context. The use of the modified ROC curve that we describe, and the ROC framework in general, holds considerable promise as a tool to identify and set decision thresholds for conservation action and to quantify the reliability of indicators before such indicators are used to determine the need for conservation action. By contrasting a suite of indicators that are meant to track the same underlying state of a system with ROC curves, one can determine which indicators are most likely to lead to correctly taking action when necessary based on the indicator with the highest AUC. This approach may prove particularly valuable for contrasting indicators of ecosystem status for ecosystem-based management when many different indicators have been proposed (e.g., Link et al. 2002) but when there is still much ongoing research into which indicators best track the state of the ecosystem from which they are derived (but see Fulton et al. 2005; Jennings 2005; Samhouri et al. 2009).

Acknowledgments We thank D. Braun, D. Noon, R. Peterman, J. Rice, E. Porszt, and 2 anonymous reviewers for constructive com-

Conservation Biology Volume 28, No. 6, 2014

ments on previous versions of this manuscript and E. Porszt and J. Samhouri for generously agreeing to share their data with us.

Supporting Information Further details on the analysis of receiver-operating characteristic curves (Appendix S1) and on generating ROC curves based on trends in abundance as an indicator of conservation status (Appendix S2) and figures of receiver operating characteristic curves under decreasing statusstate sensitivity and increasing process noise and observation error as well as across a range of process noise, observation error, and status-state sensitivity (Appendix S3) are available online. The authors are solely responsible for the content and functionality of these materials. Queries (other than absence of the material) should be directed to the corresponding author.

Literature Cited Davies, T. D., and J. K. Baum. 2012. Exctinction risk and overfishing: reconcilling conservation and fisheries perspectives on the status of marine fishes. Nature Scientific Reports 2:561. Dulvy, N. K., S. Jennings, N. B. Goodwin, A. Grant, and J. D. Reynolds. 2005. Comparison of threat and exploitation status in North-East Atlantic marine populations. Journal of Applied Ecology 42:883– 891. Dulvy, N. K., S. Jennings, S. I. Rogers, and D. L. Maxwell. 2006. Threat and decline in fishes: an indicator of marine biodiversity. Canadian Journal of Fisheries and Aquatic Sciences 63:1267– 1275. Fogarty, M. J., and S. A. Murawski. 1998. Large-scale disturbance and the structure of marine systems: fishery impacts on the Georges Bank. Ecological Applications 8:S6–S22. Fulton, E., Smith, A., and A. Punt. 2005. Which ecological indicators can robustly detect effects of fishing? ICES Journal of Marine Science 62:540–551. Hand, D., J. Oliver, and A. D. Lunn. 1998. Discriminant analysis when the classes arise from a continuum. Pattern Recognition Letters 31:641– 650. Hand, D., and C. Anagnostopoulos. 2013. When is the area under the receiver operating characteristic curve an appropriate measure of classifier performance? Pattern Recognition Letters 34:492– 495. Hisano, M., S. R. Connolly, and W. D. Robbins. 2011. Population growth rates of reef sharks with and without fishing on the Great Barrier Reef: robust estimation with multiple models. PLoS ONE 6, doi:10.1371/journal.pone.0025028. Hoffmann, M., et al. 2010. The impact of conservation on the status of the world’s vertebrates. Science 330:1503–1509. International Union for the Conservation of Nature (IUCN). 2008. IUCN Red List categories and criteria. Version 3.1. Species Survival Commission, IUCN, Gland, Switzerland. Jennings, S. 2005. Indicators to support an ecosystem approach to fisheries. Fish and Fisheries 6:212–232. Link, J. S., J. K. T. Brodziak, S. F. Edwards, W. J. Overholtz, D. Mountain, J. W. Jossi, T. D. Smith, and M. J. Fogarty. 2002. Marine ecosystem assessment in a fisheries management context. Canadian Journal of Fisheries and Aquatic Sciences 59:1429– 1440.

Connors & Cooper

Lobo, J. M., A. Jimenez-Valverde, and R. Real. 2008. AUC: a misleading measure of the performance of predictive distribution models. Global Ecology and Biogeography 17:145–151. Murawski, S. 2000. Definitions of overfishing from an ecosystem perspective. ICES Journal of Marine Science 57:649–658. Peacock, S. J., and C. A. Holt. 2012. Metrics and sampling designs for detecting trends in the distribution of spawning Pacific salmon (Oncorhynchus spp.). Canadian Journal of Fisheries and Aquatic Sciences 69:681–694. Porszt, E., R. M. Peterman, N. K. Dulvy, A. B. Cooper, and J. R. Irvine. 2012. Reliability of indicators of decline in abundance. Conservation Biology 26:894–904. Rice, J. C., and E. Legace. 2007. When control rules collide: a comparison of fisheries management reference points and IUCN criteria for

1635

assessing risk of extinction. ICES Journal of Marine Science 64:718– 722. Rodrigues, A. S. L., J. D. Pilgrim, J. F. Lamoreux, M. Hoffmann, and T. M. Brooks. 2006. The value of the IUCN Red List for conservation. Trends in Ecology & Evolution 21:71–76. Rosenberg, A. A., J. H. Swasey, and M. Bowman. 2006. Rebuilding US fisheries: progress and problems. Frontiers in Ecology and the Environment 4:303–308. Samhouri, J. F., Levin, P. S., and C. J. Harvey. 2009. Quantitative evaluation of marine ecosystem indicator performance using food web models. Ecosystems 12:1283–1298. Vida, S. 1993. A computer program for non-parametric receiver operating characteristic analysis. Computer Methods and Programs in Biomedicine 40:95–101.

Conservation Biology Volume 28, No. 6, 2014

Determining decision thresholds and evaluating indicators when conservation status is measured as a continuum.

Categorization of the status of populations, species, and ecosystems underpins most conservation activities. Status is often based on how a system's c...
853KB Sizes 0 Downloads 5 Views