Seminar on AI and Health Data
November the 10th, at 13-15
University of Helsinki, Chemicum building (A.I. Virtasen aukio 1, Helsinki), room A128
Program
Allan Tucker (Reader, Head of Intelligent Data Analysis Group, Brunel University London): How to Deal with Privacy, Bias & Drift in AI Models of National Healthcare Data
Primary healthcare care data offers huge value in modelling disease and illness. However, this data holds extremely private information about individuals and privacy concerns continue to limit the wide-spread use of such data, both by public research institutions and by the private health-tech sector. One possible solution is the use of synthetic data which mimics the underlying correlational structure and distributions of real data but avoids many of the privacy concerns. Brunel University London has been working in a long-term collaboration with the Medicine and Health Regulatory Authority in the UK to construct a high-fidelity synthetic data generator using probabilistic models with complex underlying latent variable structures. This work has led to multiple releases of synthetic data on a number of diseases including covid and cardiovascular disease, which are available for state-of-the-art AI research. Two major issues that have arisen from our synthetic data work are issues with bias, even when working with comprehensive national data, and with concept drift where subsequent batches of data move away from current models and what impact this may have on regulation.
In this talk I will discuss some of the key results of the collaboration: on our experiences of synthetic data generation, on the detection of bias and how to better represent the true underlying UK population, and how to handle concept drift when building models of healthcare data that evolves over time.
Antti Honkela (Associate Professor, University of Helsinki and FCAI): Noise-Aware Differentially Private Synthetic Data
Synthetic data generated under differential privacy (DP) promises to significantly simplify analysis of sensitive personal data by providing strong formal privacy guarantees. Existing work has shown that simply analysing DP synthetic data as if it were real does not produce valid inferences of population-level quantities, leading to too narrow confidence intervals and thereby risking false discoveries. We propose using multiple imputation techniques to avoid these problems. This requires simulating multiple synthetic data sets from the Bayesian posterior predictive distribution over data sets. We propose a novel noise-aware Bayesian DP synthetic data generation mechanism for discrete data that enables generating such a distribution of data sets. Our experiments demonstrate that the method is able to produce accurate confidence intervals from DP synthetic data.
Simo Särkkä (Associate Professor, Aalto University and FCAI): Machine learning in predictive medical diagnostics: some case studies
The aim of the talk is to present results from selected cases studies of using machine learning in medical diagnostics. In particular, the aim is to summarize the methodology and results from neonatal mortality and morbidity classification using combined sensor and medical record data. The use of deep learning in image-based diabetic retinopathy classification is presented as another case study. The talk proceeds to insights on uncertainty quantification and time-series modeling aspects arising in these applications.