Logicbased Explanations for Neural Networks

Abstract Neural networks have been the key to solve a variety of different problems. However, neural network models are still regarded as black boxes, since they do not provide any human-interpretable evidence as to why they output a certain result. In this talk, we will explore a procedure to induce human-understandable logic-based theories that attempt to represent the classification process of a given neural network model, based on the idea of establishing mappings from the values of the activations produced by the neurons of that model to human-defined concepts to be used in the induced logic-based theory. Through a series of experiments, we discuss how to map the internal state of a neural network to the human-defined concepts, examine whether the results obtained by the established mappings match our understanding of the mapped concepts, and analyse the fidelity of the resulting theory and how it can be used to generate symbolic justifications for the output of neural network models.

This work was carried out in collaboration with Manuel de Sousa Ribeiro, João Ferreira, and Ricardo Gonçalves. Bio: João Leite is Associate Professor and Head of the Computer Science Department of FCT NOVA, Portugal, and Senior Invited Fellow of the School of Computer Science and Engineering, University of New South Wales, Sydney, Australia. He is a founding member of the Nova Laboratory for Informatics and Computer Science (NOVA LINCS) since 2015 and, before that, he was a member of the Centre for Artificial Intelligence since 1997, of which he was Executive Director between 2012 and 2015. His main research area is Artificial Intelligence with a focus on Knowledge Representation and Reasoning, Multi-Agent Systems, Semantic Web, and Neural-Symbolic Systems. He has authored one book, co-edited 25 books and journal special issues, co-authored more than 120 papers, presented several keynote talks, courses and tutorials in international Conferences and Summer Schools. He was Conference Chair of JELIA-2004, Program Committee Co-Chair of JELIA-2014, and Co-Chair of several editions of the CLIMA, LADS and DALT workshops. He regularly serves in the Program Committees of major international conferences (IJCAI, AAAI, KR, AAMAS,…). He was co-recipient of the 2017 Collaborative Research Award Santander Totta.