The general aim of an index of biotic integrity (IBI) is to provide policy makers, managers and stakeholders an overall appreciation of the ecosystem condition of a site in one synthetic measure. This is achieved by evaluating the species composition at the community level. The ecological rationale consists in the fact that anthropogenic changes of the environmental conditions and ecosystem resources ultimately result in a shift of the species composition. By quantifying selected attributes of the species composition in a test variable, human impacts can be followed up. An IBI is conceived according to the principles of the Reference Condition Approach (RCA): the index assesses the ecosystem condition of a test site by evaluating the composition of its biological community compared to the expected configuration under reference conditions. If the difference is substantial in comparison to the intrinsic natural variability, it is concluded that the test site is impacted by an anthropogenic source. Ideally, the reference sites are pristine, or nearly so. Yet, it is possible to develop IBIs with respect to any well motivated baseline condition, accepted as a societal goal. This thesis discusses the IBI concept from a statistical and methodological perspective. A first focus was to obtain a better understanding of the underlying rationale. Several papers describe how to construct IBIs. However, to our knowledge, none of them makes the underlying statistical model explicit. They just give a narrative description of the calculation steps. Yet, we were able to derive a simple but flexible format, phrasing the four transformation steps of IBIs in a formalised statistical language. (i) The model equations start with deriving metrics from the community data. The metrics reflect ecological relevant features of the species distribution and are sensitive to anthropogenic alterations of the ecosystem. (ii) Subsequently, the metrics are scored, expressing how (dis)similar the observed metric values are in comparison to type-specific or site-specific reference conditions taking into account the site typology and the intrinsic natural variability. (iii) In the third step, the individual scores are aggregated (traditionally by simply summing or averaging) into the ecological quality measure (EQM), meant as an overall impact assessment. As such, EQM is hard to interpret because its value does not tell directly how impaired the ecosystem is and whether to decide about restoration. (iv) Hence, the fourth and final step compares EQM with decision thresholds resulting in the ecological quality class (EQC), an ordinal class variable appreciating the degree of biotic integrity. When developing an IBI, the selection of an optimal metric set from a candidate list remains one of the main challenges. In many instances, a coherent and transparent strategy is lacking leaving much room for subjective decisions and/or personal preferences. Important unresolved questions include how many metrics to select (the model dimension) and how to choose properly between them. In our opinion, two important factors contribute to suboptimal metric choice. First of all, optimisation criteria are seldom made explicit and are not always appropriate. Only occasionally, the crucial distinction is made between false positive (FP) and false negative (FN) errors. A high false negative fraction (FNF) implies that many degraded sites are not restored, not realising the full potential recovery of ecosystem functions, goods and services. Conversely, a high false positive fraction (FPF) results in unnecessary restoration of many unimpaired sites, detracting resources and/or risking to harm pristine sites. In analogy to diagnostic models in medicine, we propose the Receiver Operating Characteristic (ROC) curve to optimise the diagnostic accuracy of the index. ROC curves plot the true positive fraction (TPF) as a function of the false positive fraction (FPF). To gain a deeper understanding of the impact of the shape of ROC curves, we introduced utility curves plotting the cost implications of index-guided decisions as a function of FPF or TPF of the index.
Utility curves link the strength of the index as characterised by the height of the ROC curve with its practical usefulness. We inferred that the main factor determining the usefulness of the index is its capacity to realise a high TPF keeping FPF small. More specifically, we demonstrate how a strong index is capable to realise a high true restoration fraction (TRF) and a high overall restoration benefit (ORB) at a low average restoration cost (ARC).
The second factor leading to a suboptimal metric choice is the insufficient recognition that an IBI is in essence a regression model. Traditionally, an IBI is simply an average (or sum) of scored metrics. This average score model (AVG) is an ordinal logistic regression model (OLR) in disguise because it is not necessary to estimate the regression coefficients which are fixed. Yet, we can borrow concepts, strategies and techniques from statistical model building to search for the optimal suite of metrics. In this context, an important issue is overfitting, i.e. selecting too many variables in comparison to the data available, resulting in a lower diagnostic accuracy than a simpler, more parsimonious model. Another point to consider is that the optimisation criterion is a random variable. Hence, the optimal model is not necessarily the best one. To cope with these and other problems, we propose a modelling strategy which first ascertains the optimal number of metrics and then explores competing models in the vicinity of that optimum.
We illustrate our approach by revising the Estuarine Biotic Index (EBI) for the mesohaline part of the Zeeschelde Estuary in Flanders, Belgium. Statistical modelling techniques, such as best subset regression and bootstrapping, combined with optimisation criteria derived from the ROC curve, are forged together into a powerful and transparent strategy to select the optimal set of metrics. We also demonstrate that the proportional odds logistic regression model results in a very similar model as AVG. This extension to generalised linear models (GLM) opens a perspective to formulate more flexible models better adapted to the sampling design and able to incorporate background variables, adjusting for differences between sites.
A second illustration is the evaluation of the European Fish Index (EFI) as developed by the EU funded FAME project (Development, evaluation and implementation of a standardised fish-based assessment method for the ecological status of European rivers). An important requirement for meeting obligations under the European Water Framework Directive (WFD) is the development of a fish-based index that is able to predict the ecological status of surface waters and particularly to distinguish between (nearly) pristine and disturbed conditions. For the EFI, the overall FPF was 22 % and the FNF 19 %. Comparison of EFI with existing national or regional fish-based assessment methods revealed major discrepancies making intercalibration between them unfeasible. In our opinion, a representative sampling scheme covering the full spectrum of human impacts in a region, is a prerequisite to retrieve a responsive set of metrics. Therefore, to conclude, we present suggestions to improve data collection for the calibration of IBIs.