Causal explanations
2.1 Causality
Inferring a causal model is central to properly understanding a phenomenon. Following [6], a synergy paper with the HumaneAI project, where we survey the causal literature, we identify two families of causal models: graphical models, in which we infer the causal relationships and induced distributions of observed variables, and potential outcome models, in which we assume observed variables to be the outcome of a causal model, and we look to infer the counterfactual outcome of an intervention in the model. Graphical models encode variables and their conditional dependency relations, allowing us to understand what variables influence others. Pearl’s do-calculus introduced a formal calculus for intervention on causal models, allowing their users to purposefully act on the data knowing what each action will result into. Inferring a causal model is of benefit both for the user, who can test interventional actions, and to the black box, that can leverage it to perform better predictions. Explanation algorithms can leverage causal models for better feature importance computation, as it is the case for our proposal CALIME [CG21], in which we learn a causal model to infer feature importance in a principled way. We detail our work in Attachment A.1.3
2.2 Knowledge integration
Modern black boxes tend to rely on neural and subsymbolic approaches that are in stark contrast with human knowledge, which is usually symbolic in nature. The XAI community has shown an increasing interest both in symbolic knowledge injection in subsymbolic models [7], and more generally in neuro-symbolic integration. This trend is of great interest for domains with large knowledge bases, such as healthcare and Natural Language Processing (NLP) [8]. Several NLP tasks can leverage external structured and unstructured knowledge in the form of structured knowledge bases [9], e.g., Wikipedia, or free-form text [10]. This allows the models to leverage a set of relevant facts in the knowledge base, and provide them to the user to explain its reasoning. Some recent approaches go as far as using the whole live and raw web as a knowledge base, and search through it for useful facts to aid the prediction. Aside from injection, background knowledge can also be used post-hoc to align the black box learned concepts with given concepts. Besides a review of the literature, in this stream of research we have proposed two works: Doctor XAI [PPP2020], already presented in Section 1.3, and TriplEx [SMM2022].
2.2.1 TriplEx
In [SMM2022] we have developed TriplEx, an algorithm for explanation of Transformer-based models. TriplEx aims to locally explain text classification models on a plethora of tasks: natural language inference, semantic text similarity, and text classification. Given some text x to classify, TriplEx extracts a set of factual triples T, which form the basis of the explanation. Then, TriplEx looks for perturbations of T along given semantic dimensions, which vary according to the task at hand, to look for edge cases in which the black box’s prediction is preserved. In other words, TriplEx looks to generate a semi-factual explanation. The search for perturbations is guided by an external knowledge base, specifically WordNet, that allows TriplEx to perturb text along different semantic dimensions. Keeping with our running example, TriplEx may perturb “mice” and replace it with “rodent” to verify whether the model has learned to apply the same reasoning with all rodents, and not just mice. Finally, TriplEx ranks the label-preserving perturbations according to their semantic distance: the larger the semantic perturbation, the better. Additionally, for Transformer models, TriplEx also provides an alignment score of each triple, indicating what triple is more relevant for the black box, allowing the user to have a finer granularity of explanation. TriplEx extracts explanations which are correct by construction, and semantic perturbations tend to be realistic and plausible, as measured by perplexity, an automatic evaluation of the plausibility of some text. Semantic perturbations retain realistic text, indicating that leveraging semantic perturbations does indeed generate realistic explanations.
2.3 Logic reasoning
Logic is one of the most powerful languages to express slow thinking, as it enjoys several desirable properties. Logic programming allows us to induce discrete, noise-resistant, and explainable/declarative by design “programs as rules” with high levels of abstraction that mimics human reasoning. Derivations in logic yield deterministic proof trees that a user can inspect. Furthermore, logic programming lends itself to background knowledge injection, allowing the user to guide the model, even if partially, with concepts and theories that they already know to be true. These properties make it a perfect candidate language for slow thinking explanations. Statistical Relational Learning (STAR) aims to integrate logics, and by and large relational learning, and statistical learning. Some models, for instance, integrate a subsymbolic component, given by a black box, and a symbolic one, given by a logical theory, in an explainable by-design pipeline in which the black box is only tasked with learning a mapping from data to logical entities, and the logical theory is tasked with reasoning on top of the entities. An even tighter integration is presented by models directly encoding logic theories and predicates in subsymbolic structures, which often map logic connectives and quantifiers to predefined norms. Other works aim to constrain black box models with given knowledge in the form of First-Order Rules, or to extract a set of logical constraints learned by the black box. Our core approaches (LORE, GLocalX) are essentially logic-based, since they produce explanations in the form of rules (either directly inferred or as the result of abstracting sets of rules), and therefore it is natural to consider the surveyed logic-based approaches as candidates for extending the expressiveness of the explanation language of LORE and of the rule reasoning approach of GLocalX
Research line people

Turini
Full Professor
University of Pisa
R.LINE 1 ▪ 2 ▪ 5

Ruggieri
Full Professor
University of Pisa
R.LINE 1 ▪ 2

Setzu
Phd Student
University of Pisa
R.LINE 1 ▪ 2

Metta
Researcher
ISTI - CNR Pisa
R.LINE 1 ▪ 2 ▪ 3 ▪4

Beretta
Phd Student
University of Pisa
R.LINE 2

Marchiori Manerba
Phd Student
University of Pisa
R.LINE 1 ▪ 2 ▪ 5

Fontana
Phd Student
University of Pisa
R.LINE 2

Cinquini
Phd Student
University of Pisa
R.LINE 1 ▪ 2

Sree Mala
Phd Student
Scuola Normale
R.LINE 2