Secure and Privacy-Preserving Decentralized Analytics and Its Application to Biomedical Data

Speaker

JP Hubaux

EPFL

Host

Bonnie Berger

CSAIL MIT

To work properly, Machine Learning requires the access to large amounts of data. Yet, access to datasets can be difficult, because of regulations or because the controller considers its own data to be too sensitive or too precious. In this case, datasets remain in silos, thus jeopardizing the ability to properly train ML models with enough data. In this talk, we will present several results that show how to solve this problem, leveraging notably on recent advances of cryptography.

We first address the challenge of privacy-preserving training and evaluation of neural networks in an N-party, federated learning setting. We propose a novel system, POSEIDON, the first of its kind in the regime of privacy-preserving neural network training. It employs multiparty cryptography to preserve the confidentiality of the training data, the model, and the evaluation data, under a passive-adversary model and collusions between up to N−1 parties. It is based on Lattigo, an open-source lattice-based cryptographic library written in the Go language. Our experimental results show that POSEIDON achieves accuracy similar to centralized or decentralized non-private approaches.

We then switch to principal component analysis (PCA), an essential algorithm for dimensionality reduction in many data science domains. We address the problem of performing a federated PCA on private data distributed among multiple data providers while ensuring data confidentiality. Our solution, SF-PCA, is an end-to-end secure system that preserves the confidentiality of both the original data and all intermediate results in a passive-adversary model with up to all-but-one colluding parties. SF-PCA jointly leverages multiparty homomorphic encryption, interactive protocols, and edge computing to efficiently interleave computations on local cleartext data with operations on collectively encrypted data. SF-PCA obtains results as accurate as non-secure centralized solutions, independently of the data distribution among the parties.

Next, we show how we use these techniques in medical research. We propose FAMHE, a novel federated analytics system that, based on multiparty homomorphic encryption (MHE), enables privacy-preserving analyses of distributed datasets by yielding highly accurate results without revealing any intermediate data. We demonstrate the applicability of FAMHE to essential biomedical analysis tasks, including Kaplan-Meier survival analysis in oncology and genome-wide association studies in medical genetics.

Finally, we present Tune Insight SA, a start-up company that has industrialized the software implementing some of our results.

This work was carried out in collaboration with colleagues at EPFL, MIT, Broad Institute, Lausanne University Hospital and Tune Insight.

Short bio:

Prof. Jean-Pierre Hubaux is the academic director of the EPFL Center for Digital Trust (C4DT). For its whole duration (April 2018 - December 2021), he led the national Data Protection in Personalized Health (DPPH) project. Until December 2021, he was a co-chair of the Data Security Work Stream of the Global Alliance for Genomics and Health (GA4GH). From 2008 to 2019 he was one of the seven commissioners of the Swiss FCC. He is a Fellow of both IEEE and ACM. Awards: three of his papers obtained distinctions at the IEEE Symposium on Security and Privacy, the flagship event on the topic (in 2015, 2018 and 2021). He is among the most cited researchers in privacy protection and in information security. He is a co-founder of Tune Insight SA.

Photograph: https://people.epfl.ch/jean-pierre.hubaux?lang=en

Zoom link: https://mit.zoom.us/j/93513735220

Add to Calendar 2023-11-22 11:30:00 2023-11-22 13:00:00 America/New_York Secure and Privacy-Preserving Decentralized Analytics and Its Application to Biomedical Data To work properly, Machine Learning requires the access to large amounts of data. Yet, access to datasets can be difficult, because of regulations or because the controller considers its own data to be too sensitive or too precious. In this case, datasets remain in silos, thus jeopardizing the ability to properly train ML models with enough data. In this talk, we will present several results that show how to solve this problem, leveraging notably on recent advances of cryptography.We first address the challenge of privacy-preserving training and evaluation of neural networks in an N-party, federated learning setting. We propose a novel system, POSEIDON, the first of its kind in the regime of privacy-preserving neural network training. It employs multiparty cryptography to preserve the confidentiality of the training data, the model, and the evaluation data, under a passive-adversary model and collusions between up to N−1 parties. It is based on Lattigo, an open-source lattice-based cryptographic library written in the Go language. Our experimental results show that POSEIDON achieves accuracy similar to centralized or decentralized non-private approaches.We then switch to principal component analysis (PCA), an essential algorithm for dimensionality reduction in many data science domains. We address the problem of performing a federated PCA on private data distributed among multiple data providers while ensuring data confidentiality. Our solution, SF-PCA, is an end-to-end secure system that preserves the confidentiality of both the original data and all intermediate results in a passive-adversary model with up to all-but-one colluding parties. SF-PCA jointly leverages multiparty homomorphic encryption, interactive protocols, and edge computing to efficiently interleave computations on local cleartext data with operations on collectively encrypted data. SF-PCA obtains results as accurate as non-secure centralized solutions, independently of the data distribution among the parties.Next, we show how we use these techniques in medical research. We propose FAMHE, a novel federated analytics system that, based on multiparty homomorphic encryption (MHE), enables privacy-preserving analyses of distributed datasets by yielding highly accurate results without revealing any intermediate data. We demonstrate the applicability of FAMHE to essential biomedical analysis tasks, including Kaplan-Meier survival analysis in oncology and genome-wide association studies in medical genetics.Finally, we present Tune Insight SA, a start-up company that has industrialized the software implementing some of our results.This work was carried out in collaboration with colleagues at EPFL, MIT, Broad Institute, Lausanne University Hospital and Tune Insight.Short bio:Prof. Jean-Pierre Hubaux is the academic director of the EPFL Center for Digital Trust (C4DT). For its whole duration (April 2018 - December 2021), he led the national Data Protection in Personalized Health (DPPH) project. Until December 2021, he was a co-chair of the Data Security Work Stream of the Global Alliance for Genomics and Health (GA4GH). From 2008 to 2019 he was one of the seven commissioners of the Swiss FCC. He is a Fellow of both IEEE and ACM. Awards: three of his papers obtained distinctions at the IEEE Symposium on Security and Privacy, the flagship event on the topic (in 2015, 2018 and 2021). He is among the most cited researchers in privacy protection and in information security. He is a co-founder of Tune Insight SA.Photograph: https://people.epfl.ch/jean-pierre.hubaux?lang=enZoom link: https://mit.zoom.us/j/93513735220

Organizer & Contact

Shuvom Sadhuka

ssadhuka@mit.edu

Part of

Bioinformatics Seminar Series 2023

Secure and Privacy-Preserving Decentralized Analytics and Its Application to Biomedical Data

Speaker

Host

November 22 2023

Organizer & Contact

Part of

September 11

Multimodal Protein Foundation Models

October 02

Bioinformatics Seminar - Prediction potential and pitfalls in pervasive population personal genomics: Interpreting newborn genomes with Notes on privacy timebombs in functional genomics data

Secure and Privacy-Preserving Decentralized Analytics and Its Application to Biomedical Data

Speaker

Host

November 22 2023

Organizer & Contact

Part of

Related Events

September 11

Multimodal Protein Foundation Models

October 02

Bioinformatics Seminar - Prediction potential and pitfalls in pervasive population personal genomics: Interpreting newborn genomes with Notes on privacy timebombs in functional genomics data