Schultze, Sven and Withöft, Ani and Abdenebaoui, Larbi and Boll, Susanne
Proceedings of the 2023 ACM International Conference on Multimedia Retrieval
Assessing visual aesthetics is important for organizing and retrieving photos. That is one reason why several works aim to automate such an assessment using deep neural networks. The underlying models, however, lack explainability. Due to the subjective nature of aesthetics, it is challenging to find objective ground truths and explanations for aesthetics based on them. Hence, such models are prone to socio-cultural biases that come with the data, which raises questions on a wide range of ethical and technical issues. This paper presents an explainable artificial intelligence framework that adapts and combines three types of explanations for the concept of aesthetic assessment: 1) model constraints for built-in interpretability, 2) analysis of perturbation impacts on decisions, and 3) generation of artificial images that represent maxima or minima of values in the latent feature space. The objective is to improve human understanding through the explanations by creating an intuition for the model’s decision making. We identify issues that arise when humans interact with the explanations and derive requirements from human feedback to address the needs of different user groups. We evaluate our novel interactive explainable artificial intelligence technology in a study with end users (N=20). Our participants have different levels of experience in deep learning, allowing us to include experts, intermediate users, and laypersons. Our results show the benefits of the interactivity of our approach. All users found our system helpful in understanding how the aesthetic assessment was executed, reporting varying needs for explanatory details.