A repertoire of case studies aimed at in involving also final users
Case studies
In recent years, the development of AI systems focused on uncovering black-box systems through a wide range of explainability methods to make users more aware of why the AI gives the suggestion.
The scientific community’s interest on eXplainable Artificial Intelligence (XAI) has produced a multitude of research on computational methods to make explainability possible. Nevertheless, the attention on the final user has been studied with less effort.
This line address two main aspects:
the user’s decision-making process with the eXplainable AI systems used to support high stakes decision;
use cases to test explanation methods developed in XAI project.
Since the beginning of XAI project, we focused on mainly on the healthcare high-stakes decision. In the healthcare application domain, both AI and human doctors will have complementary roles reflecting their strengths and weaknesses. Therefore, it is of pivotal importance to develop an AI technology able to work synergistically with doctors. Current AI technologies have many shortcomings that hinder their adoption in the real world. In recent years, developing methods to explain AI models’ reasoning has become the focus of many of the scientific community, particularly those in the field of eXplainable AI (XAI). While several XAI methods have been developed in the past years, only a few considered the specific application domain. Consider, for example, two of the most popular XAI methods: LIME and SHAP These two methods model-agnostic and application-agnostic, meaning that they can extract an explanation from any type of black-box AI model regardless of the application domain. While the model-agnostic approach to XAI offers great flexibility to the use of these methods, the application-agnostic approach implies that the specific user needs are not considered. A few works have tried to close such a gap in the medical field by involving the doctors in the design procedure or by performing exploratory surveys e.g. Despite these recent efforts, most of the research has been focused on laypeople
In our first work, we tested the impact of AI explanation with healthcare professionals. Specifically, the context of this work is AI-supported decision-making for clinicians. Imagine, for example, a doctor who wants to have a second opinion before making a decision about the risk of a patient’s myocardial infarction. She forms her opinion on the previous visits and symptoms of the patient and then an AI suggestion is presented to her. What happens to her when she gets a second opinion? Does she trust herself or she will be more prone to follow the algorithmic suggestion to make her final decision? To answer this question, we collected data from 36 healthcare professionals to understand the impact of advice from a clinical DSS in two different cases: the case in which the clinical DSS explains the given suggestion and the case in which it does not. We adapted the judge-advisor system framework from Sniezek & Van Swol [Sniezek2001] to evaluate participants’ trust and behavioral intention to use the system in an online estimation task. Our main measure was the Weight of Advice, a measure of the degree to which the algorithmic suggestion (with or without explanation) influences the participant’s estimate. To have more meaningful insights from the participants, we collected qualitative and quantitative measures. Our results showed that participants relied more on the condition with the explanation compared to the condition with the sole suggestion. This happened even if participants found the explanation unsatisfying. It is interesting to notice that, despite the low perceived explanation quality, participants were influenced by it and relied more on the advice of the AI system. This finding might be in line with previous research on automation bias in medicine, i.e., the tendency to over-rely on automation. From the open questions at the end of the study, healthcare professionals showed an aversion to the use of algorithmic advice and a fear of being replaced by such AI systems. The importance of these results is twofold. Firstly, even if the explanation provided left most of the participants unsatisfied, they were strongly influenced by it and relied more on the advice given by the AI with the suggestion. Secondly, the importance of the ethnographic method, i.e., the open-ended questions, to get more insights from the participants that cannot be caught only with quantitative measures.
The limitations of this study need to be found in the presentation of a decision from the AI that was always correct. In future work, we aim to carry out a similar study testing if the overreliance is still maintained even when the suggestions are wrong. In the second work we performed, we tested how users react to a wrong suggestion when they have to evaluate different types of skin lesions images. The need is to develop AI systems that can assist doctors in making more informed decisions, complementing their own knowledge with the information and suggestion yielded by the AI system [MGY2021, PPP2020]. However, if the logic for the decisions of AI systems is not available it would be impossible to accomplish this goal. Skin image classification is a typical example of this problem. Here, the explanation is formed by synthetic exemplars and counter-exemplars of skin lesions (i.e. images generated and classified with the same outcome as the initial dataset, and with an outcome other than the original dataset, correspondingly). This explanation offers the practitioner a way to highlight the crucial traits responsible for the algorithmic classification decision We conducted a validation survey with 156 domain experts, novices, and laypeople to test whether the explanation increases the reliance and the confidence in the automatic decision system. The task was organized into ten questions. Each of those questions was presented as an image of a skin lesion without any label and its explanation was generated by ABELE. The participants were shown with two exemplars, classified as the presented skin lesion, and two counter-exemplars, i.e. another lesion class. They had to classify the presented image in a binary decision task to decide whether the class of the nevus by using the presented explanation. Here, one of the main points was to see how participants regain their trust after receiving a misclassified suggestion by the AI system. The results showed a slight reduction of trust towards the black box when the presented suggestion is wrong, although there is no statistically significant drop in confidence after receiving wrong advice from an AI model. However, if we restrict our analysis to the sub-sample of medical experts, we have noticed that they are more prone to lower their confidence in the system’s advice even in the subsequent trials compared to the other participants (beginners and laypeople). This study showed how domain experts are more prone to detect and adjust their estimates when the suggestion is not correct. This aspect can be important for the role of the final users of the system. That is to say, explanation methods without a consistent validation can be not taken into account as expected by the developers of such methods. Healthcare is one of the main areas in which we have put our effort to include real participants to get an insight into the effect of AI explanations during the use of clinical assisted decision-making systems. We are focusing on how to improve the explanations in the diagnosis forecasts to inform the design of healthcare systems to promote human-AI cooperation, avoid algorithm aversion and improve the overall decision-making process.
Research line people
Riccardo Guidotti
Assitant Professor University of Pisa
R.LINE 1 ▪ 3 ▪ 4 ▪ 5
Mirco Nanni
Researcher ISTI - CNR Pisa
R.LINE 1 ▪ 4
Luca Pappalardo
Researcher ISTI - CNR Pisa
R.LINE 4
Salvo Rinzivillo
Researcher ISTI - CNR Pisa
R.LINE 1 ▪ 3 ▪ 4 ▪ 5
Andrea Beretta
Researcher ISTI - CNR Pisa
R.LINE 1 ▪ 4 ▪ 5
Anna Monreale
Associate Professor University of Pisa
R.LINE 1 ▪ 4 ▪ 5
Cecilia Panigutti
Phd Student Scuola Normale
R.LINE 1 ▪ 4 ▪ 5
Francesco Spinnato
Researcher Scuola Normale
R.LINE 1 ▪ 4
Francesca Naretto
Post Doctoral Researcher Scuola Normale
R.LINE 1 ▪ 3 ▪ 4 ▪ 5
Carlo Metta
Researcher ISTI - CNR Pisa
R.LINE 1 ▪ 2 ▪ 3 ▪4
Eleonora Cappuccio
Phd Student University of Pisa - Bari
R.LINE 3 ▪ 4
Alessio Malizia
Associate Professor University of Pisa
R.LINE 3 ▪ 4
Samuele Tonati
Phd Student University of Pisa
R.LINE 4
Gizem Gezici
Researcher Scuola Normale
R.LINE 4
Francesco Giannini
Research Fellow Scuola Normale
R.LINE
Iacopo Colombini
Phd Student Scuola Normale
R.LINE 2 ▪ 4
Mariarita Pierotti
Associate Professor University of Pisa
R.LINE 4
Giovanni Mauro
Research Fellow Scuola Normale
R.LINE 4
Line 4 - Publications
2025
Embracing Diversity: A Multi-Perspective Approach with Soft Labels
Benedetta
Muscato, Praveen
Bushipaka, Gizem
Gezici, Lucia
Passaro, Fosca
Giannotti, and
1 more author
In subjective tasks like stance detection, diverse human perspectives are often simplified into a single ground truth through label aggregation i.e. majority voting, potentially marginalizing minority viewpoints. This paper presents a Multi-Perspective framework for stance detection that explicitly incorporates annotation diversity by using soft labels derived from both human and large language model (LLM) annotations. Building on a stance detection dataset focused on controversial topics, we augment it with document summaries and new LLM-generated labels. We then compare two approaches: a baseline using aggregated hard labels, and a multi-perspective model trained on disaggregated soft labels that capture annotation distributions. Our findings show that multi-perspective models consistently outperform traditional baselines (higher F1-scores), with lower model confidence, reflecting task subjectivity. This work highlights the importance of modeling disagreement and promotes a shift toward more inclusive, perspective-aware NLP systems.
@inbook{MBG2025,author={Muscato, Benedetta and Bushipaka, Praveen and Gezici, Gizem and Passaro, Lucia and Giannotti, Fosca and Cucinotta, Tommaso},booktitle={HHAI 2025},doi={10.3233/faia250654},isbn={9781643686110},issn={1879-8314},line={4,5},month=sep,open_access={Gold},pages={370--384},publisher={IOS Press},title={Embracing Diversity: A Multi-Perspective Approach with Soft Labels},visible_on_website={YES},year={2025}}
Perspectives in Play: A Multi-Perspective Approach for More Inclusive NLP Systems
Benedetta
Muscato, Lucia
Passaro, Gizem
Gezici, and Fosca
Giannotti
In Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence , Sep 2025
In the realm of Natural Language Processing (NLP), common approaches for handling human disagreement consist of aggregating annotators’ viewpoints to establish a single ground truth. However, prior studies show that disregarding individual opinions can lead to the side-effect of under-representing minority perspectives, especially in subjective tasks, where annotators may systematically disagree because of their preferences. Recognizing that labels reflect the diverse backgrounds, life experiences, and values of individuals, this study proposes a new multi-perspective approach using soft labels to encourage the development of the next generation of perspective-aware models—more inclusive and pluralistic. We conduct an extensive analysis across diverse subjective text classification tasks including hate speech, irony, abusive language, and stance detection, to highlight the importance of capturing human disagreements, often overlooked by traditional aggregation methods. Results show that the multi-perspective approach not only better approximates human label distributions, as measured by Jensen-Shannon Divergence (JSD), but also achieves superior classification performance (higher F1-scores), outperforming traditional approaches. However, our approach exhibits lower confidence in tasks like irony and stance detection, likely due to the inherent subjectivity present in the texts. Lastly, leveraging Explainable AI (XAI), we explore model uncertainty and uncover meaningful insights into model predictions. All implementation details are available at our github repo.
@inproceedings{MPG2025,author={Muscato, Benedetta and Passaro, Lucia and Gezici, Gizem and Giannotti, Fosca},booktitle={Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence},collection={IJCAI-2025},doi={10.24963/ijcai.2025/1092},line={4,5},month=sep,open_access={Gold},pages={9827–9835},publisher={International Joint Conferences on Artificial Intelligence Organization},series={IJCAI-2025},title={Perspectives in Play: A Multi-Perspective Approach for More Inclusive NLP Systems},visible_on_website={YES},year={2025}}
The explanation dialogues: an expert focus study to understand requirements towards explanations within the GDPR
Laura
State, Alejandra
Bringas Colmenarejo, Andrea
Beretta, Salvatore
Ruggieri, Franco
Turini, and
1 more author
Explainable AI (XAI) provides methods to understand non-interpretable machine learning models. However, we have little knowledge about what legal experts expect from these explanations, including their legal compliance with, and value against European Union legislation. To close this gap, we present the Explanation Dialogues, an expert focus study to uncover the expectations, reasoning, and understanding of legal experts and practitioners towards XAI, with a specific focus on the European General Data Protection Regulation. The study consists of an online questionnaire and follow-up interviews, and is centered around a use-case in the credit domain. We extract both a set of hierarchical and interconnected codes using grounded theory, and present the standpoints of the participating experts towards XAI. We find that the presented explanations are hard to understand and lack information, and discuss issues that can arise from the different interests of the data controller and subject. Finally, we present a set of recommendations for developers of XAI methods, and indications of legal areas of discussion. Among others, recommendations address the presentation, choice, and content of an explanation, technical risks as well as the end-user, while we provide legal pointers to the contestability of explanations, transparency thresholds, intellectual property rights as well as the relationship between involved parties.
@article{SBB2025,author={State, Laura and Bringas Colmenarejo, Alejandra and Beretta, Andrea and Ruggieri, Salvatore and Turini, Franco and Law, Stephanie},doi={10.1007/s10506-024-09430-w},issn={1572-8382},journal={Artificial Intelligence and Law},line={4},month=jan,open_access={Green},publisher={Springer Science and Business Media LLC},title={The explanation dialogues: an expert focus study to understand requirements towards explanations within the GDPR},visible_on_website={YES},year={2025}}
A Simulation Framework for Studying Systemic Effects of Feedback Loops in Recommender Systems
G.
Barlacchi, M.
Lalli, E.
Ferragina, F.
Giannotti, and L.
Pappalardo
Recommender systems continuously interact with users, creating feedback loops that shape both
individual behavior and collective market dynamics. This paper introduces a simulation framework
to model these loops in online retail environments, where recommenders are periodically retrained on
evolving user–item interactions. Using the Amazon e-Commerce dataset, we analyze how different
recommendation algorithms influence diversity, purchase concentration, and user homogenization
over time. Results reveal a systematic trade-off: while the feedback loop increases individual
diversity, it simultaneously reduces collective diversity and concentrates demand on a few popular
items. Moreover, for some recommender systems, the feedback loop increases user homogenization
over time, making user purchase profiles increasingly similar. These findings underscore the need for
recommender designs that balance personalization with long-term diversity.
@misc{BLF2025,author={Barlacchi, G. and Lalli, M. and Ferragina, E. and Giannotti, F. and Pappalardo, L.},doi={10.48550/arXiv.2510.14857},line={4},month=dec,title={A Simulation Framework for Studying Systemic Effects of Feedback Loops in Recommender Systems},year={2025}}
2024
Commodity-specific triads in the Dutch inter-industry production network
Marzio
Di Vece, Frank P.
Pijpers, and Diego
Garlaschelli
Triadic motifs are the smallest building blocks of higher-order interactions in complex networks and can be detected as over-occurrences with respect to null models with only pair-wise interactions. Recently, the motif structure of production networks has attracted attention in light of its possible role in the propagation of economic shocks. However, its characterization at the level of individual commodities is still poorly understood. Here we analyze both binary and weighted triadic motifs in the Dutch inter-industry production network disaggregated at the level of 187 commodity groups, which Statistics Netherlands reconstructed from National Accounts registers, surveys and known empirical data. We introduce appropriate null models that filter out node heterogeneity and the strong effects of link reciprocity and find that, while the aggregate network that overlays all products is characterized by a multitude of triadic motifs, most single-product layers feature no significant motif, and roughly 85% of the layers feature only two motifs or less. This result paves the way for identifying a simple ‘triadic fingerprint’ of each commodity and for reconstructing most product-specific networks from partial information in a pairwise fashion by controlling for their reciprocity structure. We discuss how these results can help statistical bureaus identify fine-grained information in structural analyses of interest for policymakers.
@article{DPG2024,author={Di Vece, Marzio and Pijpers, Frank P. and Garlaschelli, Diego},doi={10.1038/s41598-024-53655-3},issn={2045-2322},journal={Scientific Reports},line={4},month=feb,number={1},open_access={Gold},publisher={Springer Science and Business Media LLC},title={Commodity-specific triads in the Dutch inter-industry production network},visible_on_website={YES},volume={14},year={2024}}
A Frank System for Co-Evolutionary Hybrid Decision-Making
Federico
Mazzoni, Riccardo
Guidotti, and Alessio
Malizia
Hybrid decision-making systems combine human judgment with algorithmic recommendations, yet coordinating these two sources of information remains challenging. We present FRANK, a co-evolutionary framework enabling humans and AI agents to iteratively exchange feedback and refine decisions over time. FRANK integrates rule-based reasoning, preference modeling, and a learning module that adapts recommendations based on user interaction. Through simulated and real-user experiments, we show that the co-evolution process helps users converge toward more stable and accurate decisions while increasing perceived transparency. The system allows humans to override or modify machine suggestions while the AI agent reshapes its internal models in response to human rationale. FRANK thus promotes a collaborative decision environment where human expertise and machine learning strengthen each other.
@inbook{MBP2024,author={Mazzoni, Federico and Guidotti, Riccardo and Malizia, Alessio},booktitle={Advances in Intelligent Data Analysis XXII},doi={10.1007/978-3-031-58553-1_19},isbn={9783031585531},issn={1611-3349},line={1,3,4},open_access={NO},pages={236–248},publisher={Springer Nature Switzerland},title={A Frank System for Co-Evolutionary Hybrid Decision-Making},visible_on_website={YES},year={2024}}
Subjective NLP tasks usually rely on human annotations provided by multiple annotators, whose judgments may vary due to their diverse backgrounds and life experiences. Traditional methods often aggregate multiple annotations into a single ground truth, disregarding the diversity in perspectives that arises from annotator disagreement. In this preliminary study, we examine the effect of including multiple annotations on model accuracy in classification. Our methodology investigates the performance of perspectiveaware classification models in stance detection task and further inspects if annotator disagreement affects the model confidence. The results show that multi-perspective approach yields better classification performance outperforming the baseline which uses the single label. This entails that designing more inclusive perspective-aware AI models is not only an essential first step in implementing responsible and ethical AI, but it can also achieve superior results than using the traditional approaches.
@misc{MBG2024bb,address={Aachen, Germany},author={Muscato, Benedetta and Bushipaka, Praveen and Gezici, Gizem and Passaro, Lucia and Giannotti, Fosca},line={4,5},month=dec,title={Multi-Perspective Stance Detection},year={2024}}
Beyond Headlines: A Corpus of Femicides News Coverage in Italian Newspapers
Eleonora
Cappuccio, Benedetta
Muscato, Laura
Pollacci, Marta Marchiori
Manerba, Clara
Punzi, and
5 more authors
How newspapers cover news significantly impacts how facts are understood, perceived, and processed by the public. This is especially crucial when serious crimes are reported, e.g., in the case of femicides, where the description of the perpetrator and the victim builds a strong, often polarized opinion of this severe societal issue. This paper presents FMNews, a new dataset of articles reporting femicides extracted from Italian newspapers. Our core contribution aims to promote the development of a deeper framing and awareness of the phenomenon through an original resource available and accessible to the research community, facilitating further analyses on the topic. The paper also provides a preliminary study of the resulting collection through several example use cases and scenarios.
@misc{CMP2024,address={Aachen, Germany},author={Cappuccio, Eleonora and Muscato, Benedetta and Pollacci, Laura and Manerba, Marta Marchiori and Punzi, Clara and Mala, Chandana Sree and Lalli, Margherita and Gezici, Gizem and Natilli, Michela and Giannotti, Fosca},line={4,5},month=dec,title={Beyond Headlines: A Corpus of Femicides News Coverage in Italian Newspapers},year={2024}}
A survey on the impact of AI-based recommenders on human behaviours: methodologies, outcomes and future directions
Luca
Pappalardo, Emanuele
Ferragina, Salvatore
Citraro, Giuliano
Cornacchia, Mirco
Nanni, and
9 more authors
Recommendation systems and assistants (in short, recommenders) are ubiquitous in online platforms and influence most actions of our day-to-day lives, suggesting items or providing solutions based on users’ preferences or requests. This survey analyses the impact of recommenders in four human-AI ecosystems: social media, online retail, urban mapping and generative AI ecosystems. Its scope is to systematise a fast-growing field in which terminologies employed to classify methodologies and outcomes are fragmented and unsystematic. We follow the customary steps of qualitative systematic review, gathering 144 articles from different disciplines to develop a parsimonious taxonomy of: methodologies employed (empirical, simulation, observational, controlled), outcomes observed (concentration, model collapse, diversity, echo chamber, filter bubble, inequality, polarisation, radicalisation, volume), and their level of analysis (individual, item, model, and systemic). We systematically discuss all findings of our survey substantively and methodologically, highlighting also potential avenues for future research. This survey is addressed to scholars and practitioners interested in different human-AI ecosystems, policymakers and institutional stakeholders who want to understand better the measurable outcomes of recommenders, and tech companies who wish to obtain a systematic view of the impact of their recommenders.
@misc{PFC2024,author={Pappalardo, Luca and Ferragina, Emanuele and Citraro, Salvatore and Cornacchia, Giuliano and Nanni, Mirco and Rossetti, Giulio and Gezici, Gizem and Giannotti, Fosca and Lalli, Margherita and Gambetta, Daniele and Mauro, Giovanni and Morini, Virginia and Pansanella, Valentina and Pedreschi, Dino},doi={10.48550/arXiv.2407.01630},line={3,4,5},month=dec,publisher={arXiv},title={A survey on the impact of AI-based recommenders on human behaviours: methodologies, outcomes and future directions},year={2024}}
The evolution of Explainable Artificial Intelligence (XAI) within healthcare represents a crucial turn towards more transparent, understandable, and patient-centric AI applications. The main objective is not only to increase the accuracy of AI models but also, and more importantly, to establish user trust in decision support systems through improving their interpretability. This extended abstract outlines the ongoing efforts and advancements of our lab addressing the challenges brought up by complex AI systems in healthcare domain. Currently, there are four main projects: Prostate Imaging Cancer AI, Liver Transplantation & Diabetes, Breast Cancer, and Doctor XAI, and ABELE.
Routing algorithms typically suggest the fastest path or slight variation to reach a user’s desired destination. Although this suggestion at the individual level is undoubtedly advantageous for the user, from a collective point of view, the aggregation of all single suggested paths may result in an increasing impact (e.g., in terms of emissions). In this study, we use SUMO to simulate the effects of incorporating randomness into routing algorithms on emissions, their distribution, and travel time in the urban area of Milan (Italy). Our results reveal that, given the common practice of routing towards the fastest path, a certain level of randomness in routes reduces emissions and travel time. In other words, the stronger the random component in the routes, the more pronounced the benefits upon a certain threshold. Our research provides insight into the potential advantages of considering collective outcomes in routing decisions and highlights the need to explore further the relationship between route randomization and sustainability in urban transportation.
@article{CNP2023,author={Cornacchia, Giuliano and Nanni, Mirco and Pedreschi, Dino and Pappalardo, Luca},doi={10.52825/scp.v4i.217},issn={2750-4425},journal={SUMO Conference Proceedings},line={4,5},month=jun,open_access={Gold},pages={75–87},publisher={TIB Open Publishing},title={Effects of Route Randomization on Urban Emissions},visible_on_website={YES},volume={4},year={2023}}
Explaining Socio-Demographic and Behavioral Patterns of Vaccination Against the Swine Flu (H1N1) Pandemic
Clara
Punzi, Aleksandra
Maslennikova, Gizem
Gezici, Roberto
Pellungrini, and Fosca
Giannotti
Pandemic vaccination campaigns must account for vaccine skepticism as an obstacle to overcome. Using machine learning to identify behavioral and psychological patterns in public survey datasets can provide valuable insights and inform vaccination campaigns based on empirical evidence. However, we argue that the adoption of local and global explanation methodologies can provide additional support to health practitioners by suggesting personalized communication strategies and revealing potential demographic, social, or structural barriers to vaccination requiring systemic changes. In this paper, we first implement a chain classification model for the adoption of the vaccine during the H1N1 influenza outbreak taking seasonal vaccination information into account, and then compare it with a binary classifier for vaccination to better understand the overall patterns in the data. Following that, we derive and compare global explanations using post-hoc methodologies and interpretable-by-design models. Our findings indicate that socio-demographic factors play a distinct role in the H1N1 vaccination as compared to the general vaccination. Nevertheless, medical recommendation and health insurance remain significant factors for both vaccinations. Then, we concentrated on the subpopulation of individuals who did not receive an H1N1 vaccination despite being at risk of developing severe symptoms. In an effort to assist practitioners in providing effective recommendations to patients, we present rules and counterfactuals for the selected instances based on local explanations. Finally, we raise concerns regarding gender and racial disparities in healthcare access by analysing the interaction effects of sensitive attributes on the model’s output.
@inbook{PMG2023,author={Punzi, Clara and Maslennikova, Aleksandra and Gezici, Gizem and Pellungrini, Roberto and Giannotti, Fosca},booktitle={Explainable Artificial Intelligence},doi={10.1007/978-3-031-44067-0_31},isbn={9783031440670},issn={1865-0937},line={1,4},open_access={Gold},pages={621–635},publisher={Springer Nature Switzerland},title={Explaining Socio-Demographic and Behavioral Patterns of Vaccination Against the Swine Flu (H1N1) Pandemic},visible_on_website={YES},year={2023}}
2022
Understanding the impact of explanations on advice-taking: a user study for AI-based clinical Decision Support Systems
Cecilia
Panigutti, Andrea
Beretta, Fosca
Giannotti, and Dino
Pedreschi
In CHI Conference on Human Factors in Computing Systems , Apr 2022
The field of eXplainable Artificial Intelligence (XAI) focuses on providing explanations for AI systems’ decisions. XAI applications to AI-based Clinical Decision Support Systems (DSS) should increase trust in the DSS by allowing clinicians to investigate the reasons behind its suggestions. In this paper, we present the results of a user study on the impact of advice from a clinical DSS on healthcare providers’ judgment in two different cases: the case where the clinical DSS explains its suggestion and the case it does not. We examined the weight of advice, the behavioral intention to use the system, and the perceptions with quantitative and qualitative measures. Our results indicate a more significant impact of advice when an explanation for the DSS decision is provided. Additionally, through the open-ended questions, we provide some insights on how to improve the explanations in the diagnosis forecasts for healthcare assistants, nurses, and doctors.
@inproceedings{PBP2022,author={Panigutti, Cecilia and Beretta, Andrea and Giannotti, Fosca and Pedreschi, Dino},booktitle={CHI Conference on Human Factors in Computing Systems},collection={CHI ’22},doi={10.1145/3491102.3502104},line={4},month=apr,open_access={Gold},pages={1–9},publisher={ACM},series={CHI ’22},title={Understanding the impact of explanations on advice-taking: a user study for AI-based clinical Decision Support Systems},visible_on_website={YES},year={2022}}
Assessing Trustworthy AI in Times of COVID-19: Deep Learning for Predicting a Multiregional Score Conveying the Degree of Lung Compromise in COVID-19 Patients
Himanshi
Allahabadi, Julia
Amann, Isabelle
Balot, Andrea
Beretta, Charles
Binkley, and
52 more authors
IEEE Transactions on Technology and Society, Dec 2022
This article’s main contributions are twofold: 1) to demonstrate how to apply the general European Union’s High-Level Expert Group’s (EU HLEG) guidelines for trustworthy AI in practice for the domain of healthcare and 2) to investigate the research question of what does “trustworthy AI” mean at the time of the COVID-19 pandemic. To this end, we present the results of a post-hoc self-assessment to evaluate the trustworthiness of an AI system for predicting a multiregional score conveying the degree of lung compromise in COVID-19 patients, developed and verified by an interdisciplinary team with members from academia, public hospitals, and industry in time of pandemic. The AI system aims to help radiologists to estimate and communicate the severity of damage in a patient’s lung from Chest X-rays. It has been experimentally deployed in the radiology department of the ASST Spedali Civili clinic in Brescia, Italy, since December 2020 during pandemic time. The methodology we have applied for our post-hoc assessment, called Z-Inspection®, uses sociotechnical scenarios to identify ethical, technical, and domain-specific issues in the use of the AI system in the context of the pandemic.
@article{AAB2022,author={Allahabadi, Himanshi and Amann, Julia and Balot, Isabelle and Beretta, Andrea and Binkley, Charles and Bozenhard, Jonas and Bruneault, Frederick and Brusseau, James and Candemir, Sema and Cappellini, Luca Alessandro and Chakraborty, Subrata and Cherciu, Nicoleta and Cociancig, Christina and Coffee, Megan and Ek, Irene and Espinosa-Leal, Leonardo and Farina, Davide and Fieux-Castagnet, Genevieve and Frauenfelder, Thomas and Gallucci, Alessio and Giuliani, Guya and Golda, Adam and van Halem, Irmhild and Hildt, Elisabeth and Holm, Sune and Kararigas, Georgios and Krier, Sebastien A. and Kuhne, Ulrich and Lizzi, Francesca and Madai, Vince I. and Markus, Aniek F. and Masis, Serg and Mathez, Emilie Wiinblad and Mureddu, Francesco and Neri, Emanuele and Osika, Walter and Ozols, Matiss and Panigutti, Cecilia and Parent, Brendan and Pratesi, Francesca and Moreno-Sanchez, Pedro A. and Sartor, Giovanni and Savardi, Mattia and Signoroni, Alberto and Sormunen, Hanna-Maria and Spezzatti, Andy and Srivastava, Adarsh and Stephansen, Annette F. and Theng, Lau Bee and Tithi, Jesmin Jahan and Tuominen, Jarno and Umbrello, Steven and Vaccher, Filippo and Vetter, Dennis and Westerlund, Magnus and Wurth, Renee and Zicari, Roberto V.},doi={10.1109/tts.2022.3195114},issn={2637-6415},journal={IEEE Transactions on Technology and Society},line={4,5},month=dec,number={4},open_access={Gold},pages={272–289},publisher={Institute of Electrical and Electronics Engineers (IEEE)},title={Assessing Trustworthy AI in Times of COVID-19: Deep Learning for Predicting a Multiregional Score Conveying the Degree of Lung Compromise in COVID-19 Patients},visible_on_website={YES},volume={3},year={2022}}
Explaining Siamese Networks in Few-Shot Learning for Audio Data
Andrea
Fedele, Riccardo
Guidotti, and Dino
Pedreschi
Machine learning models are not able to generalize correctly when queried on samples belonging to class distributions that were never seen during training. This is a critical issue, since real world applications might need to quickly adapt without the necessity of re-training. To overcome these limitations, few-shot learning frameworks have been proposed and their applicability has been studied widely for computer vision tasks. Siamese Networks learn pairs similarity in form of a metric that can be easily extended on new unseen classes. Unfortunately, the downside of such systems is the lack of explainability. We propose a method to explain the outcomes of Siamese Networks in the context of few-shot learning for audio data. This objective is pursued through a local perturbation-based approach that evaluates segments-weighted-average contributions to the final outcome considering the interplay between different areas of the audio spectrogram. Qualitative and quantitative results demonstrate that our method is able to show common intra-class characteristics and erroneous reliance on silent sections.
@inbook{FGP2022,address={Cham, Switzerland},author={Fedele, Andrea and Guidotti, Riccardo and Pedreschi, Dino},booktitle={Discovery Science},doi={10.1007/978-3-031-18840-4_36},isbn={9783031188404},issn={1611-3349},line={4},open_access={NO},pages={509–524},publisher={Springer Nature Switzerland},title={Explaining Siamese Networks in Few-Shot Learning for Audio Data},visible_on_website={YES},year={2022}}
Explaining Crash Predictions on Multivariate Time Series Data
Francesco
Spinnato, Riccardo
Guidotti, Mirco
Nanni, Daniele
Maccagnola, Giulia
Paciello, and
1 more author
In Assicurazioni Generali, an automatic decision-making model is used to check real-time multivariate time series and alert if a car crash happened. In such a way, a Generali operator can call the customer to provide first assistance. The high sensitivity of the model used, combined with the fact that the model is not interpretable, might cause the operator to call customers even though a car crash did not happen but only due to a harsh deviation or the fact that the road is bumpy. Our goal is to tackle the problem of interpretability for car crash prediction and propose an eXplainable Artificial Intelligence (XAI) workflow that allows gaining insights regarding the logic behind the deep learning predictive model adopted by Generali. We reach our goal by building an interpretable alternative to the current obscure model that also reduces the training data usage and the prediction time.
@inbook{SGN2022,address={Cham, Switzerland},author={Spinnato, Francesco and Guidotti, Riccardo and Nanni, Mirco and Maccagnola, Daniele and Paciello, Giulia and Farina, Antonio Bencini},booktitle={Discovery Science},doi={10.1007/978-3-031-18840-4_39},isbn={9783031188404},issn={1611-3349},line={4},open_access={NO},pages={556–566},publisher={Springer Nature Switzerland},title={Explaining Crash Predictions on Multivariate Time Series Data},visible_on_website={YES},year={2022}}
Understanding peace through the world news
Vasiliki
Voukelatou, Ioanna
Miliou, Fosca
Giannotti, and Luca
Pappalardo
Peace is a principal dimension of well-being and is the way out of inequity and violence. Thus, its measurement has drawn the attention of researchers, policymakers, and peacekeepers. During the last years, novel digital data streams have drastically changed the research in this field. The current study exploits information extracted from a new digital database called Global Data on Events, Location, and Tone (GDELT) to capture peace through the Global Peace Index (GPI). Applying predictive machine learning models, we demonstrate that news media attention from GDELT can be used as a proxy for measuring GPI at a monthly level. Additionally, we use explainable AI techniques to obtain the most important variables that drive the predictions. This analysis highlights each country’s profile and provides explanations for the predictions, and particularly for the errors and the events that drive these errors. We believe that digital data exploited by researchers, policymakers, and peacekeepers, with data science tools as powerful as machine learning, could contribute to maximizing the societal benefits and minimizing the risks to peace.
@article{VMG2022,author={Voukelatou, Vasiliki and Miliou, Ioanna and Giannotti, Fosca and Pappalardo, Luca},doi={10.1140/epjds/s13688-022-00315-z},issn={2193-1127},journal={EPJ Data Science},line={4},month=jan,number={1},open_access={Gold},publisher={Springer Science and Business Media LLC},title={Understanding peace through the world news},visible_on_website={YES},volume={11},year={2022}}
2021
Intelligenza artificiale in ambito diabetologico: prospettive, dalla ricerca di base alle applicazioni cliniche
@article{PB2021,author={Panigutti Cecilia, Bosi Emanuele},doi={10.30682/ildia2101f},issn={1720-8335},journal={il Diabete},line={4},number={1},open_access={NO},publisher={Bologna University Press Foundation},title={Intelligenza artificiale in ambito diabetologico: prospettive, dalla ricerca di base alle applicazioni cliniche},visible_on_website={YES},volume={33},year={2021}}
GLocalX - From Local to Global Explanations of Black Box AI Models
Mattia
Setzu, Riccardo
Guidotti, Anna
Monreale, Franco
Turini, Dino
Pedreschi, and
1 more author
Artificial Intelligence (AI) has come to prominence as one of the major components of our society, with applications in most aspects of our lives. In this field, complex and highly nonlinear machine learning models such as ensemble models, deep neural networks, and Support Vector Machines have consistently shown remarkable accuracy in solving complex tasks. Although accurate, AI models often are “black boxes” which we are not able to understand. Relying on these models has a multifaceted impact and raises significant concerns about their transparency. Applications in sensitive and critical domains are a strong motivational factor in trying to understand the behavior of black boxes. We propose to address this issue by providing an interpretable layer on top of black box models by aggregating “local” explanations. We present GLocalX, a “local-first” model agnostic explanation method. Starting from local explanations expressed in form of local decision rules, GLocalX iteratively generalizes them into global explanations by hierarchically aggregating them. Our goal is to learn accurate yet simple interpretable models to emulate the given black box, and, if possible, replace it entirely. We validate GLocalX in a set of experiments in standard and constrained settings with limited or no access to either data or local explanations. Experiments show that GLocalX is able to accurately emulate several models with simple and small models, reaching state-of-the-art performance against natively global solutions. Our findings show how it is often possible to achieve a high level of both accuracy and comprehensibility of classification models, even in complex domains with high-dimensional data, without necessarily trading one property for the other. This is a key requirement for a trustworthy AI, necessary for adoption in high-stakes decision making applications.
@article{SGM2021,author={Setzu, Mattia and Guidotti, Riccardo and Monreale, Anna and Turini, Franco and Pedreschi, Dino and Giannotti, Fosca},doi={10.1016/j.artint.2021.103457},issn={0004-3702},journal={Artificial Intelligence},line={1,4},month=may,open_access={Gold},pages={103457},publisher={Elsevier BV},title={GLocalX - From Local to Global Explanations of Black Box AI Models},visible_on_website={YES},volume={294},year={2021}}
FairLens: Auditing black-box clinical decision support systems
Cecilia
Panigutti, Alan
Perotti, André
Panisson, Paolo
Bajardi, and Dino
Pedreschi
Highlights: We present a pipeline to detect and explain potential fairness issues in Clinical DSS. We study and compare different multi-label classification disparity measures. We explore ICD9 bias in MIMIC-IV, an openly available ICU benchmark dataset
@article{PPB2021,author={Panigutti, Cecilia and Perotti, Alan and Panisson, André and Bajardi, Paolo and Pedreschi, Dino},doi={10.1016/j.ipm.2021.102657},issn={0306-4573},journal={Information Processing & Management},line={1,4},month=sep,number={5},open_access={Gold},pages={102657},publisher={Elsevier BV},title={FairLens: Auditing black-box clinical decision support systems},visible_on_website={YES},volume={58},year={2021}}
Occlusion-Based Explanations in Deep Recurrent Models for Biomedical Signals
The biomedical field is characterized by an ever-increasing production of sequential data, which often come in the form of biosignals capturing the time-evolution of physiological processes, such as blood pressure and brain activity. This has motivated a large body of research dealing with the development of machine learning techniques for the predictive analysis of such biosignals. Unfortunately, in high-stakes decision making, such as clinical diagnosis, the opacity of machine learning models becomes a crucial aspect to be addressed in order to increase the trust and adoption of AI technology. In this paper, we propose a model agnostic explanation method, based on occlusion, that enables the learning of the input’s influence on the model predictions. We specifically target problems involving the predictive analysis of time-series data and the models that are typically used to deal with data of such nature, i.e., recurrent neural networks. Our approach is able to provide two different kinds of explanations: one suitable for technical experts, who need to verify the quality and correctness of machine learning models, and one suited to physicians, who need to understand the rationale underlying the prediction to make aware decisions. A wide experimentation on different physiological data demonstrates the effectiveness of our approach both in classification and regression tasks.
@article{RAB2021,author={Resta, Michele and Monreale, Anna and Bacciu, Davide},doi={10.3390/e23081064},issn={1099-4300},journal={Entropy},line={4},month=aug,number={8},open_access={Gold},pages={1064},publisher={MDPI AG},title={Occlusion-Based Explanations in Deep Recurrent Models for Biomedical Signals},visible_on_website={YES},volume={23},year={2021}}
2020
Black Box Explanation by Learning Image Exemplars in the Latent Feature Space
Riccardo
Guidotti, Anna
Monreale, Stan
Matwin, and Dino
Pedreschi
We present an approach to explain the decisions of black box models for image classification. While using the black box to label images, our explanation method exploits the latent feature space learned through an adversarial autoencoder. The proposed method first generates exemplar images in the latent feature space and learns a decision tree classifier. Then, it selects and decodes exemplars respecting local decision rules. Finally, it visualizes them in a manner that shows to the user how the exemplars can be modified to either stay within their class, or to become counter-factuals by “morphing” into another class. Since we focus on black box decision systems for image classification, the explanation obtained from the exemplars also provides a saliency map highlighting the areas of the image that contribute to its classification, and areas of the image that push it into another class. We present the results of an experimental evaluation on three datasets and two black box models. Besides providing the most useful and interpretable explanations, we show that the proposed method outperforms existing explainers in terms of fidelity, relevance, coherence, and stability.
@inbook{GMM2019,author={Guidotti, Riccardo and Monreale, Anna and Matwin, Stan and Pedreschi, Dino},booktitle={Machine Learning and Knowledge Discovery in Databases},doi={10.1007/978-3-030-46150-8_12},isbn={9783030461508},issn={1611-3349},line={1,4},pages={189–205},publisher={Springer International Publishing},title={Black Box Explanation by Learning Image Exemplars in the Latent Feature Space},visible_on_website={YES},year={2020}}
Prediction and Explanation of Privacy Risk on Mobility Data with Neural Networks
Francesca
Naretto, Roberto
Pellungrini, Franco Maria
Nardini, and Fosca
Giannotti
The analysis of privacy risk for mobility data is a fundamental part of any privacy-aware process based on such data. Mobility data are highly sensitive. Therefore, the correct identification of the privacy risk before releasing the data to the public is of utmost importance. However, existing privacy risk assessment frameworks have high computational complexity. To tackle these issues, some recent work proposed a solution based on classification approaches to predict privacy risk using mobility features extracted from the data. In this paper, we propose an improvement of this approach by applying long short-term memory (LSTM) neural networks to predict the privacy risk directly from original mobility data. We empirically evaluate privacy risk on real data by applying our LSTM-based approach. Results show that our proposed method based on a LSTM network is effective in predicting the privacy risk with results in terms of F1 of up to 0.91. Moreover, to explain the predictions of our model, we employ a state-of-the-art explanation algorithm, Shap. We explore the resulting explanation, showing how it is possible to provide effective predictions while explaining them to the end-user.
@inbook{NPN2020,author={Naretto, Francesca and Pellungrini, Roberto and Nardini, Franco Maria and Giannotti, Fosca},booktitle={ECML PKDD 2020 Workshops},doi={10.1007/978-3-030-65965-3_34},isbn={9783030659653},issn={1865-0937},line={4,5},open_access={NO},pages={501–516},publisher={Springer International Publishing},title={Prediction and Explanation of Privacy Risk on Mobility Data with Neural Networks},visible_on_website={YES},year={2020}}
Explaining Image Classifiers Generating Exemplars and Counter-Exemplars from Latent Representations
Riccardo
Guidotti, Anna
Monreale, Stan
Matwin, and Dino
Pedreschi
Proceedings of the AAAI Conference on Artificial Intelligence, Apr 2020
We present an approach to explain the decisions of black box image classifiers through synthetic exemplar and counter-exemplar learnt in the latent feature space. Our explanation method exploits the latent representations learned through an adversarial autoencoder for generating a synthetic neighborhood of the image for which an explanation is required. A decision tree is trained on a set of images represented in the latent space, and its decision rules are used to generate exemplar images showing how the original image can be modified to stay within its class. Counterfactual rules are used to generate counter-exemplars showing how the original image can “morph” into another class. The explanation also comprehends a saliency map highlighting the areas that contribute to its classification, and areas that push it into another class. A wide and deep experimental evaluation proves that the proposed method outperforms existing explainers in terms of fidelity, relevance, coherence, and stability, besides providing the most useful and interpretable explanations.
@article{GMM2020,author={Guidotti, Riccardo and Monreale, Anna and Matwin, Stan and Pedreschi, Dino},doi={10.1609/aaai.v34i09.7116},issn={2159-5399},journal={Proceedings of the AAAI Conference on Artificial Intelligence},line={1,4},month=apr,number={09},open_access={NO},pages={13665–13668},publisher={Association for the Advancement of Artificial Intelligence (AAAI)},title={Explaining Image Classifiers Generating Exemplars and Counter-Exemplars from Latent Representations},visible_on_website={YES},volume={34},year={2020}}
Predicting and Explaining Privacy Risk Exposure in Mobility Data
Francesca
Naretto, Roberto
Pellungrini, Anna
Monreale, Franco Maria
Nardini, and Mirco
Musolesi
Mobility data is a proxy of different social dynamics and its analysis enables a wide range of user services. Unfortunately, mobility data are very sensitive because the sharing of people’s whereabouts may arise serious privacy concerns. Existing frameworks for privacy risk assessment provide tools to identify and measure privacy risks, but they often (i) have high computational complexity; and (ii) are not able to provide users with a justification of the reported risks. In this paper, we propose expert, a new framework for the prediction and explanation of privacy risk on mobility data. We empirically evaluate privacy risk on real data, simulating a privacy attack with a state-of-the-art privacy risk assessment framework. We then extract individual mobility profiles from the data for predicting their risk. We compare the performance of several machine learning algorithms in order to identify the best approach for our task. Finally, we show how it is possible to explain privacy risk prediction on real data, using two algorithms: Shap, a feature importance-based method and Lore, a rule-based method. Overall, expert is able to provide a user with the privacy risk and an explanation of the risk itself. The experiments show excellent performance for the prediction task.
@inbook{NPM2020,author={Naretto, Francesca and Pellungrini, Roberto and Monreale, Anna and Nardini, Franco Maria and Musolesi, Mirco},booktitle={Discovery Science},doi={10.1007/978-3-030-61527-7_27},isbn={9783030615277},issn={1611-3349},line={4,5},open_access={NO},pages={403–418},publisher={Springer International Publishing},title={Predicting and Explaining Privacy Risk Exposure in Mobility Data},visible_on_website={YES},year={2020}}
Doctor XAI: an ontology-based approach to black-box sequential data classification explanations
Cecilia
Panigutti, Alan
Perotti, and Dino
Pedreschi
In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency , Jan 2020
Several recent advancements in Machine Learning involve blackbox models: algorithms that do not provide human-understandable explanations in support of their decisions. This limitation hampers the fairness, accountability and transparency of these models; the field of eXplainable Artificial Intelligence (XAI) tries to solve this problem providing human-understandable explanations for black-box models. However, healthcare datasets (and the related learning tasks) often present peculiar features, such as sequential data, multi-label predictions, and links to structured background knowledge. In this paper, we introduce Doctor XAI, a model-agnostic explainability technique able to deal with multi-labeled, sequential, ontology-linked data. We focus on explaining Doctor AI, a multilabel classifier which takes as input the clinical history of a patient in order to predict the next visit. Furthermore, we show how exploiting the temporal dimension in the data and the domain knowledge encoded in the medical ontology improves the quality of the mined explanations.
@inproceedings{PPP2020,author={Panigutti, Cecilia and Perotti, Alan and Pedreschi, Dino},booktitle={Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency},collection={FAT* ’20},doi={10.1145/3351095.3372855},line={1,3,4},month=jan,open_access={NO},pages={629–639},publisher={ACM},series={FAT* ’20},title={Doctor XAI: an ontology-based approach to black-box sequential data classification explanations},visible_on_website={YES},year={2020}}
2019
Factual and Counterfactual Explanations for Black Box Decision Making
Riccardo
Guidotti, Anna
Monreale, Fosca
Giannotti, Dino
Pedreschi, Salvatore
Ruggieri, and
1 more author
The rise of sophisticated machine learning models has brought accurate but obscure decision systems, which hide their logic, thus undermining transparency, trust, and the adoption of artificial intelligence (AI) in socially sensitive and safety-critical contexts. We introduce a local rule-based explanation method, providing faithful explanations of the decision made by a black box classifier on a specific instance. The proposed method first learns an interpretable, local classifier on a synthetic neighborhood of the instance under investigation, generated by a genetic algorithm. Then, it derives from the interpretable classifier an explanation consisting of a decision rule, explaining the factual reasons of the decision, and a set of counterfactuals, suggesting the changes in the instance features that would lead to a different outcome. Experimental results show that the proposed method outperforms existing approaches in terms of the quality of the explanations and of the accuracy in mimicking the black box.
@article{GMG2019,author={Guidotti, Riccardo and Monreale, Anna and Giannotti, Fosca and Pedreschi, Dino and Ruggieri, Salvatore and Turini, Franco},doi={10.1109/mis.2019.2957223},issn={1941-1294},journal={IEEE Intelligent Systems},line={1,4},month=nov,number={6},open_access={Gold},pages={14–23},publisher={Institute of Electrical and Electronics Engineers (IEEE)},title={Factual and Counterfactual Explanations for Black Box Decision Making},visible_on_website={YES},volume={34},year={2019}}
Explaining Multi-label Black-Box Classifiers for Health Applications
Cecilia
Panigutti, Riccardo
Guidotti, Anna
Monreale, and Dino
Pedreschi
Today the state-of-the-art performance in classification is achieved by the so-called “black boxes”, i.e. decision-making systems whose internal logic is obscure. Such models could revolutionize the health-care system, however their deployment in real-world diagnosis decision support systems is subject to several risks and limitations due to the lack of transparency. The typical classification problem in health-care requires a multi-label approach since the possible labels are not mutually exclusive, e.g. diagnoses. We propose MARLENA, a model-agnostic method which explains multi-label black box decisions. MARLENA explains an individual decision in three steps. First, it generates a synthetic neighborhood around the instance to be explained using a strategy suitable for multi-label decisions. It then learns a decision tree on such neighborhood and finally derives from it a decision rule that explains the black box decision. Our experiments show that MARLENA performs well in terms of mimicking the black box behavior while gaining at the same time a notable amount of interpretability through compact decision rules, i.e. rules with limited length.
@inbook{PGM2019,author={Panigutti, Cecilia and Guidotti, Riccardo and Monreale, Anna and Pedreschi, Dino},booktitle={Precision Health and Medicine},doi={10.1007/978-3-030-24409-5_9},isbn={9783030244095},issn={1860-9503},line={1,4},month=aug,pages={97–110},publisher={Springer International Publishing},title={Explaining Multi-label Black-Box Classifiers for Health Applications},visible_on_website={YES},year={2019}}