1 Introduction

When a public administration wishes to implement policies, there is a previous need of comparing different options to assess their social attractiveness. A fair policy assessment process should consider the ethical obligation of taking a plurality of social values, perspectives and interests into account. For example, the European Commission (EC) current practice on Impact Assessment (IA) considers three main objectives i.e. efficiency, effectiveness (including proportionality) and coherence and it is based on the assessment of various broad impacts such as economic, environmental and social (including distributional consequences on social actors) ones (see e.g. Sieber and Pérez Domínguez 2011). There is no doubt that IA is multidimensional in nature (Bäcklund 2009; Bozeman and Pandey 2004), and as a consequence, Multiple Criteria Decision Analysis (MCDA) can be a very useful methodological and operational framework (Bell et al. 2003; Munda 2004). In this framework, mathematical models play a very important role, i.e. the one of guaranteeing consistency between assumptions used and results obtained.

MCDA proceeds on the basis of following main concepts: dimensions, objectives, criteria, weights, criterion scores, impact matrix and compromise solution (see e.g. Figueira et al. 2016; Ishizaka and Nemery 2013; Roy 1996; Vincke 1992). Dimension is the highest hierarchical level of analysis and indicates the scope of objectives, criteria and criterion scores. In IA studies, the general categories of economic, social and environmental impacts are dimensions. Objectives indicate the direction of change desired, e.g. growth has to be maximised, social exclusion has to be minimised, carbon dioxide emissions have to be reduced. A criterion is a function that associates alternative actions with a variable indicating its desirability. Weights are often used to represent the relative importance attached to dimensions, objectives and criteria. The idea behind this practice is very intuitive and easy, that is, to place the greatest number in the position corresponding to the most important factor. A criterion score is an assessment of the impact consistent with a given criterion with reference to a policy option. Criterion scores can be both qualitative or quantitative (Hinloopen and Nijkamp 1990). The impact matrix presents in a structured way, the information on the various criterion scores, i.e. each element of the matrix represents the performance of each option according to each criterion.

A “discrete multi-criterion problem” can be formally described as follows: A is a finite set of N feasible actions. M is the number of different points of view, or evaluation criteria, gm, that are considered relevant to a specific policy problem. Where action a is evaluated to be better than action b (both belonging to the set A), by the m-th point of view, then gm(a) > gm(b). In this way a decision problem may be represented in an N by M matrix P called an evaluation or impact matrix. In such a matrix, the typical element pmn (m = 1, 2, …, M; n = 1, 2, …, N) represents the evaluation of the n-th alternative by means of the m-th criterion, in other words, each criterion score represents the performance of each alternative according to each criterion (see Table 1).

Table 1 Example of an Impact Matrix

In general in a multi-criterion problem, there is no solution optimising all the criteria at the same time (ideal or utopia solution), and therefore “compromise solutions” have to be found.

The importance of mathematical approaches in MCDA is their ability to allow a consistent aggregation of the diverse information; otherwise, even if everybody would agree on the multidimensional nature of an IA study, the implementation in a real-world assessment exercise would be impossible. The standard objection might be that the aggregation of apples and oranges is impossible. Multi-criteria mathematics does answer to this objection in a definitive way.

Just to give an example of the typical difficulties we may find when solving multi-criterion problems, the reader may look at the numerical example shown in Table 2, where 21 criteria rank four options (a, b, c, d). Criteria are grouped according to the ranking they support (i.e. 3 criteria are in favour of abcd, while 7 in favour of bdca, and so on).

Table 2 Numerical Example with 21 Criteria and 4 Options

A first possibility is to apply the so-called plurality rule, meaning that the option which is most often ranked in first place is the winner. Thus, in our case, option a would be chosen since eight criteria put it in first position. However, if one looks carefully at Table 2, it can be seen that option a also has the strongest opposition, since 13 criteria put it in last position.

From this plurality rule paradox two main lessons can be learned:

  1. 1.

    Good ranking procedures should respect the entire ranking of options and not the first position only.

  2. 2.

    It is important to consider not only what a majority of criteria prefer, but also what they reject.


Arrow and Raynaud (1986) analysed the formal analogies between the multi-criterion problem and the social choice one. They concluded that the main results of social choice also apply to MCDA; in particular Arrow’s impossibility theorem stating that there is no perfect mathematical aggregation rule (Arrow 1963). Thus, unlike other mathematical fields, neither approximation (i.e., discovering pre-existing truths) nor convergence (i.e., does the decision automatically lead, in a finite number of steps, to the optimum a*?) criteria can be used. Only “reasonable” mathematical procedures can be developed in this field. Reasonable here means that algorithms can be evaluated not only according to the formal properties they present, but overall according to the empirical consequences implied by their use too.

The present paper presents a multi-criterion framework where the various criterion scores can be both qualitative and quantitative, and the mathematical aggregation rule is as consistent and simple as possible. An illustrative example is developed and finally some conclusions are drawn.

2 A multi-criterion framework for ex-ante impact assessment

Here, I will try to isolate some properties that can be considered desirable for a discrete multi-criteria aggregation rule (often called multi-criteria method) in the framework of ex-ante IA. In synthesis, the information contained in the impact matrix useful for solving the so-called multi-criterion problem is:

  • Intensity of preference (when quantitative criterion scores are present).

  • Number of criteria in favour of a given alternative.

  • Weight attached to each single criterion.

  • Relationship of each single alternative with all the other alternatives.

Combinations of this information generate different aggregation conventions, i.e. manipulation rules of the available information to arrive at a preference structure. The aggregation of several criteria implies taking a position on the fundamental issue of compensability (see Podinovskii 1994; Roberts 1979; Vansnick 1990; Vincke 1992). Compensability is a very important concept when MCDA is applied to integrate various policy dimensions. For example, in evaluating a policy option that presents a very bad environmental impact and a very good economic impact, it is clear that allowing or not for compensability and to which degree is the key assumption.

To search for compromises implies that no-dictator must exist. That is, all the criteria relevant in a policy problem have to be used simultaneously and not in a lexicographic order, since otherwise some criteria will have a much higher weight a priori. Thus for example, a legislative system which foresees that a financial analysis of projects has to be done before the evaluation of their environmental impacts, it is indeed prioritizing the economic dimension with respect to the environmental one. Multi-criteria decision analysis for ex-ante impact assessment must then be based on more general models than the lexicographic one, allowing the use of different objectives and criteria at the same time.

Complete compensability is not desirable for the problem we are dealing with, since it implies that e.g. a good performance on efficiency would offset a very bad one on effectiveness or vice versa. On the other hand, complete non-compensability is not desirable either, since it would imply the use of a lexicographic model and the consequent choice of a “dictator”, e.g. efficiency. Consequently, the only option left is the use of partial compensatory methods, such as the "outranking methods", including e.g. ELECTRE (Roy 1996) and PROMETHEE (Brans et al. 1986). These methods, following the Condorcet tradition, entail aggregating the criteria into a partial binary relation aSb (an outranking relation) based on concordance and discordance indexes, and then "exploiting" this relationship. Each of these two steps may be treated in a number of ways according to the problem formulation and the particular case under consideration.

To illustrate this approach consider Parliamentary voting. The concordant coalition can be considered as the sum of the votes of the members in favour of a given option; according to a majority-voting rule, this option will be approved if it obtains more than 50% of the votes. According to the normative tradition in political philosophy, all coalitions, however small, should be given some fraction of the decision power. One measure of this power is the ability to veto certain subsets of outcomes. This explains the use of the condition of non-discordance.

In practice, the effect of the discordance test is that even if M-1 criteria support the recommendation of choosing a over b, this recommendation must not be accepted if only one criterion is against it with a strength bigger than the veto threshold. This implies that in a situation where all criteria would support a policy option, this option cannot be accepted if one criterion is very strongly against this option. Of course, this depends on the way in which “very strongly” is defined, i.e. the definition of the veto threshold.

Various authors have argued that the presence of qualitative information in evaluation problems concerning socio-economic issues is a rule, rather than an exception (see e.g., Nijkamp et al. 1990). Thus there is a clear need for methods that are able to take into account information of a "mixed" type (both qualitative and quantitative criterion scores). For simplicity, we refer to qualitative information as information measured on a nominal or ordinal scale, and to quantitative information as information measured on an interval or ratio scale.

Moreover, ideally, this information should be precise, certain, exhaustive and unequivocal. Nevertheless, in reality, it is often necessary to use information which does not have those characteristics so that one has to face the uncertainty of a stochastic and/or fuzzy nature present in the data. Therefore, multi-criteria methods able to tackle consistently the widest types of mixed information should be considered as desirable ones.

In the 1990s some outranking methods were especially designed to address public policy analysis, one of the most widespread being NAIADE (Munda 1995); it is a discrete multi-criteria method whose impact matrix may include crisp, stochastic or fuzzy measurements of the performance of an alternative with respect to an evaluation criterion. Thus, it is very flexible for real-world applications. From a mathematical point of view, NAIADE deals with two main issues:

  1. 1.

    the problem of equivalence of the procedures used in order to standardize the mixed criterion scores;

  2. 2.

    the problem of comparison of fuzzy numbers typical of all fuzzy multi-criteria methods.


These two issues are dealt with a new semantic distance that is useful in the case of continuous, convex membership functions also allowing a definite integration.

All outranking methods, together with other approaches based on pair-wise comparisons, suffer of the following three main problems: First, Arrow’s axiom of independence of irrelevant alternatives is not respected; thus, the phenomenon of rank reversal may appear (i.e. the preference between a and b can change in function of the fact that a third option c is considered or not).Footnote 1 Second, the Condorcet paradox may appear, i.e. alternative a may be ranked better than b, b better than c and c better than a. In addition, there is a problem specifically connected with the outranking approach. That is the necessity to establish a large number of “preference parameters”, i.e. indifference and preference thresholds, concordance and discordance thresholds and weights. This may cause a loss of transparency and consistency in the model. In the framework of public policy, outranking approaches are an interesting assessment framework, but to guarantee consistency with the social process behind the problem structuring, the mathematical aggregation rules need to be kept as simple as possible (see Munda 2008 for a deeper technical discussion on this issue).

An aggregation rule that is simple, non-compensatory and minimises the rank reversal phenomena is the kemeny rule (Kemeny 1959; Munda and Nardo 2009). Moreover it was explicitly designed to solve the Condorcet paradox, thus cycles are never present. Its basic idea is that the maximum likelihood ranking of policy options is the ranking supported by the maximum number of criteria (or criterion weights) for each pair-wise comparison, summed over all pairs of options considered. For example, let us assume that in comparing four options according to nine criteria, the following pair-wise comparisons are obtained:

 

A

B

C

D

A

0

4

4

5

B

5

0

5

6

C

5

4

0

3

D

4

3

6

0

Then, the corresponding scores of all possible rankings are the following:

B

A

D

C

31

C

B

D

A

27

B

D

C

A

31

D

B

A

C

27

A

B

D

C

30

D

C

B

A

27

B

D

A

C

30

A

C

B

D

26

B

C

A

D

29

A

D

C

B

26

B

A

C

D

28

D

A

B

C

26

B

C

D

A

28

D

C

A

B

26

C

B

A

D

28

D

A

C

B

25

D

B

C

A

28

C

A

D

B

24

A

B

C

D

27

C

D

B

A

24

A

D

B

C

27

A

C

D

B

23

C

A

B

D

27

C

D

A

B

23

Moulin (1988, p. 312) clearly states that the Kemeny method is “the correct method” for ranking options, and that the “only drawback of this aggregation method is the difficulty in computing it when the number of candidates grows”. A numerical algorithm solving this computational drawback in an efficient way has been developed recently (Azzini and Munda 2020) and it has been implemented in a software tool called SOCRATES (SOcial multi-CRiteria AssessmenT of European policieS). This software is based on the assumption that multi-criteria analysis requires the definition of relevant dimensions, objectives and criteria. It uses weights as importance coefficients and clarify their role in the hierarchical structure. The impact matrix may include either quantitative (including also stochastic and/or fuzzy uncertainty, like in NAIADE) and qualitative (ordinal and/or linguistic) measurements of the performance of an alternative with respect to an evaluation criterion. It supplies a complete ranking of the options considered; all methodological and mathematical details can be found in (Azzini and Munda 2020; Munda 2004, 2012; Munda and Nardo 2009).

3 An illustrative example

Let us consider as an illustrative example, a European Commission IA on modernising VAT for cross-border B2C e-Commerce developed recently. The impacts of the various options considered are summarised in the following impact matrix, showed in Table 3.

Table 3 Impact Matrix of the Illustrative example

In the original study, this impact matrix is commented on qualitative groundsFootnote 2 (without using any formal aggregation procedure) and it is concluded that option 5 is considered to be the most positive as a business established in a Member State can make supplies to a customer in another Member State under broadly the same rules as a domestic transaction, the VAT rate applicable being the only exception. This option reduces overall compliance costs for business by 55% and evidence points to this option between the optimum one in terms of meeting the overall general and specific objectives of the proposal” (p. SWD(2016) 379 final, p. 48).

Let us now see if this conclusion is corroborated by using a mathematical aggregation rule. By applying SOCRATES, the ranking shown in Fig. 1 is obtained (under the assumption that all criteria have the same weight).

Fig. 1
figure 1

Multi-criteria ranking of all options shown in Table 3 under the equal criterion weights assumption

The ranking is very clear: option 5 is the best choice followed by 6 and 4; the set of options 1, 3 and 2 is clearly the worst one. More information can be obtained by checking the pairwise comparisons, which allow one to be fully aware of the mutual weaknesses and strengths on each single evaluation criterion. This information is summarised graphically in Fig. 2, where the degrees of credibility that any option is preferred or indifferent with respect to another one on each single criterion are illustrated. For example, it is possible to deduce that options 5, 6 and 4 are indeed very similar, although there is a weak preference towards option5. In fact if one looks at the performance on each of the single criteria, it is possible to see immediately that only a few criteria are clearly in favour of option 5, while all the other criteria evaluate these three options as indifferent. On the contrary, when comparing one of these three top options with options 1, 2 and 3, the preference relation is very clear. Consequently, a first clear and easy defendable conclusion seems to be that the set of options 4, 5 and 6 is definitely to be preferred to the set composed by options 1, 2 and 3.

Fig. 2
figure 2figure 2figure 2

Pairwise comparison of all alternatives according to each criterion

The degrees of credibility presented in Fig. 2 tackle the issue that sometimes the difference of the criterion scores between two options is not sufficient to state that one is better than the other one. A problem inherent in the use of precise indifference and preference thresholds is that they can create the strange situation that e.g. up to the value 1.9999 one would conclude that the two options are indifferent and starting from 2.0001 one would definitely state that the preference relation seems plausible. For this reason, in SOCRATES credibility degrees are measured on the y-axis, while in the x-axis there is the difference between two options on one single criterion, in the case of indifference they indicate that zero difference intensity (x-axis) makes the credibility equal to 1 (in the y-axis), and then the greater the difference intensity the smaller the credibility of an indifference relation. This credibility is greater than 0.5 up to the value of the indifference thresholds and smaller than 0.5 starting from the indifference thresholds. The credibility of an indifference relation then necessarily must be a monotonically decreasing function. In the case of preference, the reverse holds. At zero difference the credibility of preference is zero, then the greater the intensity the more credible the preference relation. This credibility is greater than 0.5 when the preference threshold is overtaken (see Fig. 3 for an example dealing with the criterion VAT revenues). Therefore, the credibility degree of a preference relation can only be a monotonically increasing function. As one can see thanks to the preference modelling based on the use of credibility degrees, the issue of significance of difference intensities is dealt with properly, and no abrupt transition from indifference to preference is allowed.

Fig. 3
figure 3

Example of credibility degrees of fuzzy indifference and preference relations

To further clarify the preference structure, it is advisible to perform a sensitivity analysis (Saltelli et al. 2008). In the framework of SOCRATES, the objective of sensitivity analysis is to check if the rankings provided are stable and to determine which of the input parameters influence more the model output. Local sensitivity analysis looks at the sensitivity of results to (a) the exclusion/inclusion of different criteria and dimensions; and (b) dimensions and criterion weights change; all parameters are changed one per time. Global sensitivity analysis focuses on all the possible combinations of criterion weights; all parameters are changed simultaneously. The whole information produced by local and global sensitivity analyses is synthesised into simple graphics.

Let us then first look at the influence of the exclusion of the various criteria and dimensions, one per time and at the effect of using the subset of criteria belonging to one dimension only (i.e. first one criterion per time is eliminated and the corresponding ranking is obtained later a whole dimension, e.g. economic and competitiveness with all its criteria is eliminated and the effect on the final ranking is checked).

The results of this exercise are presented in Fig. 4, where it is indicated how many times each option is present in any rank position, and the percentage each rank position is occupied by each single option. In this way, it becomes clearer and clearer that option 5 is the most desirable one, in fact it occupies the first position in the 79 per cent of all the rankings obtained.

Fig. 4
figure 4

Robustness analysis according to exclusion/inclusion of criteria and dimensions

Finally, the issue of robustness of results with respect to weights is particularly relevant. Here we have made the assumption that all criteria are weighted equally. However, the fact that all criteria have the same weight does not guarantee at all that objectives and dimensions have the same weight. This would be guaranteed only under the condition that all the dimensions have the same number of criteria; this of course is quite unnatural and artificial. On the contrary, different criterion weights can guarantee that all the dimensions are considered of equal importance.

Since we have already computed the rankings according to the equal criterion weighting assumption, let us now see what happens if we attach the same weight to each dimension and then splitting each weight among the objectives and criteria of any dimension proportionally. One should note that weights can be used in the way described here, only if they have the meaning of importance, which depends on the fact that they are combined with non-compensatory aggregation mathematical rules. Figs. 5 and 6 illustrate these two weighting situations for our problem graphically. Fig. 7 presents the ranking obtained under the equal dimension weighting assumption.

Fig. 5
figure 5

Equal criterion weights assumption

Fig. 6
figure 6

Equal dimension weights assumption

Fig. 7
figure 7

Multi-criteria ranking of all options shown in Table 3 under the equal dimension weights assumption

As one can see, overall the original statement that option 5 is a reasonable choice has been corroborated by the multi-criteria analysis. All arguments explicitating the preference structure to solve the impact matrix have been made clear, and no doubt the degree of transparency becomes much higher in comparison with the qualitative reasoning analysis only.

4 How can ex-ante impact assessment practice make use of this study?

The institutional practice on policy impact assessment can be continuously improved if existing guidelines and regulations are updated at the policy-design level, in view of new scientific findings on modelling methodologies and tools. This article has shown that the use of multi-criteria mathematical aggregation rules in ex-ante IA studies has at least three important justifications:

1. As clearly proved by Arrow’s theorem, even when using mathematical rules, the ranking of policy options cannot be straightforward. A fortiori, the risk of deriving somewhat wrong rankings is much higher when qualitative reasoning only is used.

2. Even when the qualitative reasoning leads to correct conclusions, like the illustrative example we have examined, the use of mathematical aggregation rules brings more information and thus more transparency into the analysis.

3. When using mathematical rules, consistency between the problem structuring and the ranking of policy options is guaranteed, this makes the overall IA study much more defensible in a public arena.

In summary, we can conclude that multi-criteria mathematics does answer to the standard objection that the aggregation of apples and oranges is impossible in a definitive way. However, it is important to remember that quantitative modelling of policy problems cannot provide exact answers, but it can help policy-makers and all social actors involved by providing a scientific sound framework for a systematic and transparent analysis. There is no doubt that ex-ante IA is multidimensional in nature, and as a consequence, beyond the application of mathematical aggregation rules, MCDA and in particular social multi-criteria evaluation (SMCE), which has been explicitly designed for public policy, can be a very useful methodological and operational framework. The basic methodological foundation of SMCE is incommensurability, i.e. the notion that in comparing options, a plurality of technical dimensions and social perspectives is needed (Munda 2008; 2016).

Amartya Sen has raised the point that democracy to be effective in practice, needs a shared language of the public sphere and this cannot be supplied by any specific single point of view. “… there is such a long tradition in parts of economics and political philosophy of treating one allegedly homogeneous feature (such as income or utility) as the sole ‘good thing’ that could be effortlessly maximized (the more the merrier), that there is some nervousness in facing a problem of valuation involving heterogeneous objects, … And yet any serious problem of social judgement can hardly escape accommodating pluralities of values, … We cannot reduce all the things we have reason to value into one homogeneous magnitude.” (Sen 2009 p. 239).

Given that a complex real-world problem cannot be completely described by a model, since descriptive completeness and formal consistency are incompatible, it is essential to remember that a model always depends on the assumptions introduced to make the model relevant for a given policy problem, i.e. which representation of reality is used. Fairness in the policy process can be seen as an ethical obligation to take a plurality of social values, perspectives and interests into account in a coherent and transparent manner. There is no doubt that to consider social actors’ opinions is a key success factor for an IA (see e.g. Saltelli et al. 2013; Torriti 2010).

The main achievement of Social Multi-Criteria Evaluation is the fact that the use of various evaluation criteria has a direct translation in terms of plurality of values and dimensions used in the assessment exercise. SMCE can accomplish the goals of being inter/multi-disciplinary with respect to the research team and participatory with respect to the local community. It is also transparent because all criteria are presented in their original form without any transformations into a common measurement rod (e.g., money, energy or whatever).