Assessment of fairness of AI-based future interactive technologies for “non-AI-specialists”

Enrico Frumento (Cefriel Politecnico di Milano, Milano), Carlo Dambra (Zenabyte, Genova)

Supervised ML algorithms are inherently discriminatory. They are discriminatory in the sense that use information embedded in the features of data to separate instances into distinct categories. When AI are used for supervised machine learning, this “discrimination” is used as an aid to help to categorise data into different categories within the data distribution. Directly or indirectly, this may result in discrimination or biases because of group affiliation. AIs are afflicted and “unfairness” is a type of bias. An AI model is fair if its outputs are independent of sensitive parameters (e.g., race, sexual orientation, gender, religious faith, and disability) for a specific task affected by social discrimination. Today, many organisations are trying to find a way to implement fair AI services. However, many AI ethical frameworks cannot be implemented in practice without an increase in the complexity of the AI frameworks. Organisations need an implementation strategy that includes concrete metrics that many transversal competences can measure and monitor (engineers, data scientists, and legal personnel). AMNESIA is a European research project whose aim was to explore and assess the technical feasibility and commercial potential of measuring algorithmic unfairness of fully automated AI solutions (“black boxes”).

Artificial Intelligence (AI) is nowadays an integral part of everyone’s daily life, ranging from tasks such as social network recommendations to things like generating drugs to cure diseases or give risk scoring to banks for loan requests. AI is affecting our lives in multiple ways, more than we realise. But are the tasks they perform ethically sound? Are AI decisions ethically sound and based on non-discriminatory data or criteria? These questions are becoming crucial today, besides that AI decisions are often difficult for humans to understand. The problem becomes: if AIs are “fair”, how to measure the fairness and finally yet importantly, how an average citizen can know when it is not the case.

The so-called “unfairness” is a type of bias and AI is afflicted with bias just as humans are. Generally, a given AI model is fair if the outputs are independent of sensitive parameters (e.g., gender, race, sexual orientation, religious faith, disability, etc.) for a specific task that is already affected by social discrimination [1]. In this case, AI bias is caused by “the inherent prejudice in the data used to train the model, leading to social discrimination. This leads to a lack of equal opportunities” [1].

Biases of AI systems can be of different types [2]. For example, AIs can be trained over under-representing populations (e.g., specific communities are under-represented in the data [3]), faulty interpretations (e.g., a face recognition system not considering specific skin colours), etc. Although data scientists cannot eliminate bias in data analysis, most researchers are posing how to measure and fix the problem.

The first step to understand unfairness is to realize that AIs algorithms may have problems, data might be inadequate, and algorithms flawed. In their 2020 “AI hype cycle”, Gartner introduced the concept of Responsible AI [4]: It is an umbrella term that encompasses the many aspects of making the right business and ethical choices when adopting AI that organisations often address independently, including things such as trust, transparency, bias mitigation, explainability and fairness. The research community has perceived these concerns and has started to investigate how ML models do not take unfair decisions for sensitive attributes.

Among the objectives of the Italian strategy on artificial intelligence, there is the promotion of awareness and trust in AI among citizens. A goal that passes through creation of AI fair, with all groups of society (1). When the attention is posed in the IT industry, the fairness of AI models becomes even more relevant. Nowadays, there is a proliferation of AI in both B2B and B2C applications. As an example, in the financial sector, as banks increasingly deploy AI tools to make credit decisions, they must revisit an unwelcome fact about the practice of lending, which historically had biases against protected characteristics, such as gender, race and sexual orientation [5]. Moreover, recently the concrete harms AI technologies can cause are becoming clear, together with the related legal liability implications on the IT industry itself [6]. Many organisations are trying to balance fairness and accuracy in their AI models.

The exclusion of some portions of the algorithm, or the amendment of the dataset, or the AI model’s modification to accomplish fairness criteria may lead to less accuracy in some conditions. This is a research dilemma [7, 8]: “to maximise accuracy, models have to learn everything they can from the data — including the human biases embedded in them. But to maximise fairness, they must unlearn the human biases”. In general, the desire is to have high-level principles to ensure that AI makes ethical decisions and causes no harm.

Ultimately, organisations need an implementation strategy that includes concrete metrics that many transversal competences can measure and monitor (engineers, data scientists, and legal personnel). As demonstrated in recent papers [9], many AI ethical frameworks cannot be implemented in practice without an increase in the specificity of existing AI frameworks and few technical personnel can concretely convert such high-level guidance. This means that often AI ethics frameworks may be suitable for marketing campaigns but fail to stop AIs from causing the harms described.

These problems are emphasised for IT industries that do not possess a strong background in AI. These “non-AI-specialist” organisations are interested in applying in their decision-making processes, fully automated AI tools and, at the same time, in automatically assessing the fairness of their results.

This means that the market should offer to these users a simple enough and intuitive tool and interface able to deliver the answer to some basic questions quickly:

  • What is the fully automated tool in that field of application?
  • What is the most accurate fully automated tool in a particular field of application (e.g., school, and financial credit)?
  • To how much accuracy I must give up increasing fairness in that field of application?

The AMNESIA project aims to fill the above-mentioned market gap for these IT industries by offering a methodology and a dashboard to help them identify the most appropriate fully automated AI tool, in terms of accuracy and fairness for a given application domain. It also suggests mitigation solutions to reduce the existing unfairness in the results.

Analysis of the existing literature underlines that there is still no single best way to measure AI transparency or fairness. However, this should not discourage organisations, which can draw from a combination of different research, legal precedents, and technical best practices. AMNESIA aims to enable as many organisations as possible to find solutions for the fairness of their services.

AMNESIA is a European research project whose aim was to explore and assess the technical feasibility and commercial potential of measuring algorithmic unfairness of fully automated AI solutions (“black boxes”). The project focuses on AI-based fully automated tools and models made available by major IT players, which are based on many Future Interactive Technologies (FIT) solutions (e.g., chatbots, ranking tools, etc.). The choice of concentrating on fully automated AI solutions is crucial because of its increasing adoption by developers of AI solutions worldwide, and because unfairness in these services can cascade into others.

Fully automated AI solutions are becoming mainstream because of major IT players’ involvement, as mentioned before, since they are automating applying machine-learning techniques to data. Typically, a data scientist would spend much of their time pre-processing, selecting features, selecting and tuning models and then evaluating the results. Fully automated AI solutions can highly simplify these tasks by providing a baseline that offers high performing results to specific problems and insights into exploring further.

The AMNESIA’s objective has been reached by:

  • Building a complete picture of the state-of-the-art of fairness metrics.
  • Generalising and improving the identified metrics integrating them in a fit for purpose holistic solution.
  • Suggesting ways to mitigate the detected unfair behaviour through dashboards.
  • Developing a TRL3 Dashboard and validating it using available FIT tools and publicly available datasets covering different sectors (school, finance, crime, and health care).
  • Assessing the commercial viability of the proposed solutions.

The main result of the project is represented by a dashboard that allows assessing the fairness and accuracy of the identified tools for a given application domain, provides the selected fairness and accuracy metrics and suggests possible mitigation solutions.

Figure 1 — AMNESIA dashboard example

The importance of the use of the AMNESIA Dashboard is demonstrated by the extreme variability of the results and their strict dependence on the chosen parameters when fully automated AI solutions are used to solve a given problem. Typically, the goal of a “non-skilled” user is to identify which of the tools offered by the major IT players offers the best compromise between fairness and accuracy in implementing a given task in a selected application domain with fixed constraints (metrics and sensitive feature).

Let us consider a classification task with defined accuracy and fairness metrics, the race as a sensitive feature, and its application to three different application domains:

  • Education.
  • Finance.
  • Crime.

The distribution of the accuracy and fairness values obtained by applying the AMNESIA approach using all the available datasets and three different fully automated AI solutions is shown in Figure 2.

Figure 2 — Performance of the different AI tools (identified with a square, a circle, and a diamond, respectively) in fairness and accuracy.

As it can be noted, the fairness and accuracy performance of the different tools (identified with a square, a circle, and a diamond, respectively) are relatively scattered in the graph. From this, the choice of the “optimal” tool heavily depends on several variables, including:

  • The selected application domain and the datasets available for that specific domain (the larger the number of datasets, the better the obtained result).
  • The selected fairness and accuracy metrics (the metrics’ choice has specific impacts on the tool’s function).

This example corroborates the initial assumptions that were made on the need for the AMNESIA solution, especially for the identified target. AMNESIA targets IT solution developers, without a solid background in AI, that are interested in applying AI in their decision-making processes, using fully automated tools and, at the same time, in automatically assessing the fairness of their results.

Even if more is needed to ensure that AI is not generating serious harms, having clear metrics is the first step to understanding when ethical failures occur in AI-based decision-making processes. Many companies have developed tools to assess AI fairness and are doing their best to fight AI bias; however, AMNESIA focuses on broader access to shared metrics for non-AI experts.

AMNESIA has received funding from the European Union’s Horizon 2020 research and innovation programme under the NGI_TRUST grant agreement no 825618.

[1] A. L., “AI is Flawed — Here’s Why,” 30 10 2020. [Online]. Available:

[2] G. Lawton, “8 types of bias in data analysis and how to avoid them,” TechTarget, 26 10 2020. [Online]. Available:

[3] S. K. Skelton, “Auditing for algorithmic discrimination,” ComputerWeekly, 07 07 2020. [Online]. Available:

[4] L. Columbus, “What’s New In Gartner’s Hype Cycle For AI, 2020,” Enterprise Irregulars, 25 10 2020. [Online]. Available:

[5] S. Townson, “AI Can Make Bank Loans More Fair,” 06 Novembre 2020. [Online]. Available:

[6] A. Burt, “The Liabilities of Artificial Intelligence Are Increasing,”, 15 June 2020. [Online]. Available:

[7] F. L. Cesista, “The Accuracy-Fairness Dilemma,” [Online]. Available:

[8] Oracle AI, “Unlocking Fairness: A Trade-Off Revisited,” 03 03 2020. [Online]. Available:

[9] Algirithm Watch, “In the realm of paper tigers — exploring the failings of AI ethics guidelines,” 28 April 2020. [Online]. Available:

(1) “conduct analysis and evaluations of the socio-economic impact of development and widespread adoption of AI-based systems, along with proposals for tools to mitigate the encountered issues”,

Senior Domain Specialist on cybersecurity @ Cefriel | Psychohistorian