top of page

Standardization and digitization of methods for impact evaluation – why not now?

Updated: Mar 20, 2023

Many risk assessment methods are standardized for decades (e.g. ISO31000 Risk Management, US Army Safety Risk Management, etc.) and typically use approaches with quantitative risk matrices, as the one presented in Figure 1. Similar stands for Life Cycle Assessment – LCA (ISO14044), and ESG (GRI, CDP, etc.), which even more exploit numbers and numerical ratings, using somewhat different approaches to obtain them.

Figure 1. Classical 5x5 risk assessment matrix, with numerical values and 6 ranks

Having standards in evaluation of effects obviously provides benefits - experts are better guided and less doubted, bias is minimized, transparency and understanding is elevated, reproducibility is fully enabled.

Numerical values (i.e. digital scores, ranks of results) are essential for digitalization. Numerics are required for visualization and presentations, e.g. effective comparative graphs and charts, compact tables, maps, etc. Those visual elements can significantly aid analysis, support judgments and provide better communication and engagement. Furthermore, numerical scores allow statistical analysis and quantitative comparisons (i.e. between alternatives, receptor groups, VECs), which are, unfortunately, rarely seen in ESIA.

Despite all apparent advantages and practices in similar areas, EIA methods for predicting impact significance are still not standardized and not digitized, at least not widely. Why?

The EIA scientific community is discussing the general approach that should be taken in the impact evaluation. Over decades, numerous approaches have been proposed and can be found in the literature (good overview in Methods of Environmental and Social Impact Assessment, eds. Therivel, G. and Wood, G., Routledge 2018). They span from more descriptive and less quantitative (checklists with scores, threshold, expert judgments), towards more deterministic (cause-pathway-effects diagrams, weighting matrices) to fully quantitative (mass balancing and physical modeling, overlay maps, networks, illustration example in Figure 2). The latter, although more rigorous, are usually too complex and demanding for practical use in real-life EIA, even with the support of software tools. Also, the real challenge is how those can be effectively generalized and applied for versatile occasions. And usually they provide a quantitative result, which is not a final evaluation of significance (i.e. physical modeling simulation with map vs. biological or social effect that causes). Therefore, heuristic approaches that are based on a combination of math and simple expert judgments, which result in ranking the impact significance, are more practical and achievable (analogous to many risk assessments).

Figure 2. Recent example publication on a quantitative method for EIA with causal pathway network and overlay maps. Source: Peeters, L.J.M., Holland, K.L., Huddlestone-Holmes, C., Boulton, A.J. Science of The Total Environment, 802, 2022, copyright Elesevier.

However, many specialists are not prone to use any quantitative methodology. First, a number of them argue that different project types and versatile environments / societal groups and phenomena cannot be placed under the single scoring framework. However, standardization does not mean a single model fits for all. More approaches, methods and models are to be expected, and can be standardized for groups / categories of projects, receptors, etc. as it was done in other fields (i.e. ESG sector or topic standards).

Even within diverse methods for impact significance, a “common denominator” can be defined. Actually, in many ESIA studies the same criteria are used, e.g. magnitude, sensitivity, duration, distribution, etc., with very similar choices within (e.g. small, medium, high or similar). A generalization can be achieved even on a higher level, for different project types, sources and receptor categories. Specifically, a method of the same structure can be used, while maintaining differences in criteria options, their descriptions / meanings, and in weighting factors, when needed.

Second, numerous practitioners are hesitant to select discrete choices (e.g. between low, medium and high), and argue that there are usual situations for intermediate judgements and cases of unreliability. However, they offer a descriptive approach, which is more fuzzy and imprecise in many cases. And is non-reproducible, and usually too long.

Third, this view is especially prejudicial to obtaining a final rank out of the multi-criteria model. Undoubtedly, the reliability and certainty of weighting factors and formulas are essential for obtaining meaningful final scores and proper palette. This is a true challenge for standardization and quantification. Nevertheless, a proper mathematical and statistical analysis, together with a sound domain expertise, should be able to derive widely acceptable models.

In fact, a number of methods circulating in practice are providing a final rank. What is missing there?

The methods in use commonly have several insufficiencies – a) they are derived just from experts’ experiences, without more rigorous mathematical analysis, b) the result distinguishes only several ranks (e.g. minor, moderate and major), c) the ranks are not numerical values, but are descriptive (e.g. depictive terms).

Having 4, 5 or more criteria in a scoring model, each offering 2,3 or more options, which then returns only 3 or 4 ranks is not coherent and sensible enough. This fact can be easily confirmed mathematically, as such a complex structure provides thousands of resulting combinations. Having 3 or even 4 final ranks as a final result is restrictive, as far too many combinations are grouped in a single answer.

Further, the usual approach with several final ranks limits the practical usability. For instance, a too narrow scale cannot be used for a genuine comparative analysis of project alternatives, VECs in cumulative assessment or mapping more sensitive receptor categories. Figure 3. illustrates quantitatively how mitigation measures reduce the number of more adverse impacts to the pre-mitigation case, while the alternative project location is less favorable, having more negative impacts (scored / ranked -4 and -3). This simple example demonstrates how a wider impact score scale (ranks -6 to -1) can be used in comparative analysis. On the other hand, such fine differences (distinctions) between cases cannot be attained with 3 rank scale.

Figure 3. Quantitative comparison between the project alternatives. Impact evaluation scores are on x-axis, while the number of impacts is on y-axis.

Finally, there is a general resistance in adopting numerical values of ranks, although it brings many digital advantages, listed above in the intro. The arguments against are again related to (un)certainty, but also to a perception of weight that comes with a definitive number. Seemingly, project proponents are more keen to communicate with a “major adverse” than with -4, although derived with exactly the same methodology and has the analogous meaning. And they also prefer a narrow rank scale.

Figure 4. presents the recent research on the number of potential, residual and significant impacts reported in EISs, in several countries. The number of significant impacts is incredibly low and the authors think this is due to bias, rather than sustainable design and mitigation. As a concussion, they recommend adopting more rigorous assessment methodologies, which would be enforced by regulators.

Figure 4. The number of potential impacts, residual impacts and significant impacts reported in environmental impact statements (EISs). Bars represent the bootstrap 95% confidence interval of the medians, and the red lines represent the global medians. Source: Singh, G.G., et al. People and Nature, 2, 2020

Nowadays, the derivation of the impact significance methods can be aided with artificial intelligence (AI).

Namely, Natural Language Processing (NLP) can be utilized for gathering necessary data from the abundance of publicly available EIA studies. NLP could obtain numerous combinations on criteria, selections within and resulting scores, and key information on the context (project type, location and the environment). After harmonization and noise filtering of data, the next phase would be to analyze the gathered data to define the proper score range, using statistical sensitivity analysis.

Afterwards, the formula should be set; which is a delicate step, as a number of forms can be considered, from linear (summation) to non-linear (multiplications, power-law), multiple consecutive expressions or matrices, etc. In ESIA, there are more dimensions (criteria) than just 2 as in risk matrix, so the relations are more complex, for example, some criteria might be multiplied (stronger relationship and influence), while other ones can be added (weaker influence).

In the following phase, the expression parameters (weighting factors) would be obtained using numerical optimization algorithms (gradient or stochastic) to find the best fit with collected data. The subsequent testing and validity analysis should point out uncertainty levels, which can be included in the method and formula. If needed, fuzzy logic or robust optimization approaches can be used, to obtain more reliable models.

Alternatively to selecting model expressions in advance, approaches with neural networks (ANN) can be used. Machine Learning or more likely Deep Learning algorithms could be exploited to train and derive an impact significance model (based on data collected by NLP). This approach could be more robust, predictive and open to updates. However, as all other ANNs, it is more like a “black box” and thus less transparent than a deterministic expression-wise approach. And it is more demanding for implementation in EIA digital systems or general software applications, which can limit the availability and their wider use.

Thus, defining relevant methods is a tedious task, but once they are defined, agreed widely and imposed (by standards or regulations), their implementation and use should not be difficult.

Overall, absence of standardized methods is probably one of the reasons why EIA is advancing in incremental steps, and becoming considered ineffective among many environmentalists and decision-makers, who are more frequently turning to ESG and other sustainability analyses.

Not having numerical results is one of the causes why EIA is sluggish in digital transformation, and is behind many other engineering and consultancy services.

Apparently, there is not enough agreement and determination within the impact assessment community and stakeholders. The full freedom of selecting evaluation approaches is preferred and undoubtedly has certain justification, related to specialists’ integrity and complexity of EIA. However, the heterogeneity and “noise” it brings is overcoming the benefits, which is becoming more evident.

The governmental, international financial institutions and other decision-makers are rather passive - no platforms or umbrellas for methods’ standardization have been initiated. General impression is that the stagnation is driven by the preferences of project investors, while the consultants and practitioners are under their shadow.

Although the impact evaluation is multiplex and diverse, with interplay of various variables, there is an abundance of the EIA theory and practice accumulated, that could bring forth a compromise - standards for digit-based methods for impacts evaluations, that are usable and reliable. This effort could be led by the IA international associations, i.e. IAIA or Equator Principles, or other renewed institutions (EPA, IEMA), or through even a wider collaborative scheme (e.g. UN platform). The necessary math knowledge, reinforced with AI, is also out there.

Why not act now for the sake of more objective and transparent impact assessment? What to wait for?


Founder of Eon+ and the principal co-author of Envigo

231 views0 comments


bottom of page