AI4Science @ Caltech

Explainable AI (XAI) Virtual Workshop

A Caltech Virtual Workshop, a part of the AI4Science Series

September 23, 2021





As the size and complexity of data and software systems keep increasing, we are increasingly dependent on the Artificial Intelligence (AI) and Machine Learning (ML) in order to extract an actionable knowledge from the data. In science, we are steadily moving towards a human-AI collaborative discovery, as we explore complex data landscapes. However, the results or recommendations from the AI systems may be hard to understand or interpret, which is an essential component of the data-to-discovery process, whether in science, business, security, or any other data analytics domain. Trust and credibility of AI in practical applications can have significant ethical, political or even life-or-death consequences.

Explainable AI (XAI) aims to develop AI systems where the results or the solutions generated by can be understood by humans, at it is critical for the future of the field. XAI poses a multitude of challenges, and it is a very active area of research. The goal of this virtual workshop is to describe some of the challenges and the possible XAI developments in which our community is engaged. It is a part of the ongoing AI4Science series.


Speakers and Abstracts


Anima Anandkumar

Caltech and NVIDIA

Trinity of Explainable AI: Calibrated, Verifiable, and User-friendly AI

Explainable AI encompasses three important requirements: calibrated models, verifiable unit tests and user-friendly interfaces to seamlessly communicate them to the end user. We develop efficient and guaranteed methods for deep-learning models that satisfy these requirements. We propose a deep distributionally robust learning (DRL) method for generating calibrated uncertainties even when there are distributional shifts. For verifiable unit tests, we employ generative adversarial networks (GAN) to generate semantically meaningful variations and certify robustness under them. Lastly, we are developing prototypes of a user-centered AI auditing tool for medical language models. We address end-user design challenges of 1) supporting interpretable specification of semantic verification rules, 2) supporting completeness of elicitation of such specification, and 3) communicating model behavior via metrics aligned with the user-provided semantic specification.


Katie Bouman

Caltech

Beyond the First Portrait of a Black Hole

As imaging requirements become more demanding, we must rely on increasingly sparse and/or noisy measurements that fail to paint a complete picture. Computational imaging pipelines, which replace optics with computation, have enabled image formation in situations that are impossible for conventional optical imaging. For instance, the first black hole image, published in 2019, was only made possible through the development of computational imaging pipelines that worked alongside an Earth-sized distributed telescope. However, remaining scientific questions motivate us to improve this computational telescope to see black hole phenomena still invisible to us and to meaningfully interpret the collected data. This talk will discuss how we are leveraging and building upon recent advances in machine learning in order to achieve more efficient uncertainty quantification of reconstructed images as well as to develop techniques that allow us to extract the evolving structure of our own Milky Way's black hole over the course of a night.


Ciro Donalek

Virtualitics

Multidimensional Visualization as a Path Towards the Explainable AI

Making the black box of AI more transparent is quickly becoming a must have for all AI applications: individuals have the right to explanation when decisions made through the use of ML models severely affect their life. Decision makers are often reluctant to trust models they cannot comprehend. Visualization can facilitate an intuitive understanding of both data and AI models.This talk focuses on how XAI coupled with multidimensional and interactive visualization can be used to uncover and understand key insight faster than with traditional data analytics tool and increase the trust in the underlying model.


Ashish Mahabal

UCI

The Role of Explainable AI in Time Domain Astronomy

Astronomy surveys have been leading to discoveries that are orders of magnitude more numerous than just a decade ago. As we go fainter, cover more wavelengths, and go multi-messenger, more novel and interesting objects will be found. The rarity of these objects will push the boundaries of our understanding, benefiting from using cutting-edge machine learning tools. We explore some aspects of time domain astronomy and its inherent biases, and comment on the requirement to explain the results either through techniques like uncertainty quantification, Bayesian methodology, or through post-hoc interpretability that is more common.


Lior Pachter

Caltech

The How and Why of Interpretability in the Biological Sciences

Machine learning tools have had a significant positive impact on the biological sciences, as they have on other sciences, most recently with the success of AlphaFold for protein folding. However, the utility of increasingly increasingly accurate tools for prediction and discovery can be tempered by lack of interpretability, which is particularly important for biomedical applications, but also for molecular biology implications. I will discuss several innovations in semi-supervised deep learning for biology applications, and how they are coupled to interpretable models to drive forward biological discovery


Cynthia Rudin

Duke University

Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead

With widespread use of machine learning, there have been serious societal consequences from using black box models for high-stakes decisions, including flawed bail and parole decisions in criminal justice. Explanations for black box models are not reliable, and can be misleading. If we use interpretable machine learning models, they come with their own explanations, which are faithful to what the model actually computes.


David Van Valen

Caltech

Single-Cell Biology in a Software 2.0 World

Multiplexed imaging methods can measure the expression of dozens of proteins while preserving spatial information. While these methods open an exciting new window into the biology of human tissues, interpreting the images they generate with single cell resolution remains a significant challenge. Current approaches to this problem in tissues rely on identifying cell nuclei, which results in inaccurate estimates of cellular phenotype and morphology. In this work, we overcome this limitation by combining multiplexed imaging’s ability to image nuclear and membrane markers with large-scale data annotation and deep learning. We describe the construction of TissueNet, an image dataset containing more than a million paired whole-cell and nuclear annotations across eight tissue types and five imaging platforms. We also present Mesmer, a single model trained on this dataset that can perform nuclear and whole cell segmentation with human-level accuracy – as judged by expert human annotators and a panel of pathologists – across tissue types and imaging platforms. We show that Mesmer accurately measures cell morphology in tissues, opening up a new observable for quantifying cellular phenotypes in tissues. We make this model available to users of all backgrounds with both cloud-native software and on-premise software. Last, we also describe ongoing work to develop a similar resource and models for dynamic live-cell imaging data.


Kiri Wagstaff

JPL

Explainable Discovery for Planetary Science and Exploration

The growing deployment of machine learning (ML) systems in medicine, finance, advertising, and even space exploration has raised new, urgent questions about how these systems make their decisions and why. This is especially relevant when ML is used to aid in the discovery of new objects, phenomena, composition, etc. I will describe how explanations are generated for the AEGIS target selection system onboard Mars rovers and also for the analysis of large Mars image archives on the ground. These techniques can help advance our understanding of new discoveries and inform the next steps in exploration and data acquisition.


Rose Yu

UCSD

Incorporating Symmetry for Learning Spatiotemporal Dynamics

While deep learning has shown tremendous success in many scientific domains, it remains a grand challenge to incorporate physical principles into such models. In this talk, I will demonstrate how to incorporate symmetries into deep neural networks and significantly improve physical consistency, sample efficiency, and generalization in learning spatiotemporal dynamics. I will showcase the applications of these models to challenging problems such as turbulence forecasting and trajectory prediction for autonomous vehicles.


Yisong Yue

Caltech

Neurosymbolic Programming

Neurosymbolic programming is an emerging research area at the interface of deep learning and program synthesis. The goal is to learn functions that are represented as programs that use symbolic primitives, often in conjunction with neural network components. Neurosymbolic programming can offer multiple advantages over end-to-end deep learning. Programs can sometimes naturally represent long-horizon, procedural tasks that are difficult to perform using deep networks. Neurosymbolic representations are also, commonly, easier to interpret, analyze, and trust than neural networks. The restrictions of a programming language can serve as a form of regularization and lead to more generalizable and data-efficient learning. Compositional programming abstractions can also be a natural way of reusing learned modules across learning tasks. In this talk, I will provide a brief overview of this emerging area, highlight recent developments (eg: http://www.neurosymbolic.org/), and point to directions for future work.


James Zou

Stanford

A Data-Centric View of XAI with Healthcare Applications

What's the data used to train and evaluate critical AI systems? Is the data sufficient to achieve reliable performance, or do they introduce biases? We survey the data behind important AI algorithms, including FDA-approved medical AI devices, and quantify several key limitations. Motivated by these challenges, I will discuss recent advances in improving trustworthy and explainable AI by data valuation and concept learning.

Program

9:00 - 9:15
Adam Wierman & George Djorgovski
Welcome and Introduction

9:15 - 9:45
Yisong Yue (Caltech)
Neurosymbolic Programming

9:45 - 10:15
Cynthia Rudin (Duke U.)
Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead

10:15 - 10:45
Katie Bouman (Caltech)
Beyond the First Portrait of a Black Hole

10:45 - 11:15
Anima Anandkumar (Caltech and NVIDIA)
Trinity of Explainable AI: Calibrated, Verifiable, and User-friendly AI

11:15 - 11:45
Ciro Donalek (Virtualitics)
Multidimensional Visualization as a Path Towards the Explainable AI

11:45 - 12:00 Reserved

12:00 - 12:30
Virtual Posters Session (separate Zoom session, ID: 863 0434 6274)

12:30 - 1:00
Virtualitics NLP Demo (separate Zoom session, ID: 940 1403 6961)

1:00 - 1:30
Rose Yu (UCSD)
Incorporating Symmetry for Learning Spatiotemporal Dynamics

1:30 - 2:00
Ashish Mahabal (Caltech)
The Role of Explainable AI in Time Domain Astronomy

2:00 - 2:30
Kiri Wagstaff (JPL)
Explainable Discovery for Planetary Science and Exploration

2:30 - 3:00
Lior Pachter (Caltech)
The How and Why of Interpretability in the Biological Sciences

3:00 - 3:30
David Van Valen (Caltech)
Single-Cell Biology in a Software 2.0 World

3:30 - 4:00
James Zou (Stanford)
A Data-Centric View of XAI with Healthcare Applications

4:00 - 4:30
Virtualitics Biotech Demo (separate Zoom session, ID: 938 1299 0062)

4:30 Adjourn

The workshop will take place as a Zoom webinar, and consist of invited talks and a discussion panel. The talks will be recorded and posted online.

Acknowledgments

The workshop is organized by the Center for Data-Driven Discovery, the Information Science and Technology at the California Institute of Technology, and the Center for Data Science and Technology at JPL.

The Program Committee consists of:

S. George Djorgovski (chair)
Anima Anadkumar
Katie Bouman
Adam Wierman
Yisong Yue 


We acknowledge a generous support by Virtualitics, Inc. and by Amazon Web Services.

Organized by:

Astro at Caltech

Astronomy

at Caltech

Center for Data-Driven Discovery

CDDD

Center for Data-Driven Discovery

Information Science and Technology

IST

Information Science and Technology

Center for Data Science and Technology at JPL

CDST

Center for Data Science and Technology at JPL

Sponsored by:

Information Science and Technology

AWS

Amazon Web Services

Contact Us

Want to receive announcements about upcoming AI4Science events?
Subscribe to our mailing list!

Questions or Comments about the AI4Science Initiative or this website?
Contact: sydney -at- caltech -dot- edu