Statistical inference with computational constraints

Speaker

Host

Noah Golowich

MIT

Abstract: The vast amount of digital data we create and collect has revolutionized many scientific fields and industrial sectors. However, despite our success in harnessing this transformative power of data, computational and societal trends emerging from the current practices of data science necessitate upgrading our toolkit for data analysis.
In this talk, we discuss how practical considerations such as time and memory limits affect statistical inference tasks, with a particular focus on the problem of entropy estimation of a distribution by streaming over i.i.d. samples from it. We determine how bounded memory affects the number of samples we need to solve this problem.
Recent work by Acharya et al. (NeurIPS 2019) showed how to estimate the entropy of a distribution D over an alphabet of size k up to \epsilon additive error by streaming over (k/\epsilon^3)⋅polylog(1/\epsilon) i.i.d. samples and using only O(1) words of memory. In this work, we propose a new constant memory scheme that reduces the sample complexity to (k/\epsilon^2)⋅polylog(1/\epsilon). We conjecture that this is optimal up to polylog(1/\epsilon) factors.
Based on: https://arxiv.org/abs/2205.09804
Joint work with: Andrew McGregor, Jelani Nelson, Erik Waingarten

Add to Calendar 2023-04-26 16:00:00 2023-04-26 17:00:00 America/New_York Statistical inference with computational constraints Abstract: The vast amount of digital data we create and collect has revolutionized many scientific fields and industrial sectors. However, despite our success in harnessing this transformative power of data, computational and societal trends emerging from the current practices of data science necessitate upgrading our toolkit for data analysis.In this talk, we discuss how practical considerations such as time and memory limits affect statistical inference tasks, with a particular focus on the problem of entropy estimation of a distribution by streaming over i.i.d. samples from it. We determine how bounded memory affects the number of samples we need to solve this problem.Recent work by Acharya et al. (NeurIPS 2019) showed how to estimate the entropy of a distribution D over an alphabet of size k up to \epsilon additive error by streaming over (k/\epsilon^3)⋅polylog(1/\epsilon) i.i.d. samples and using only O(1) words of memory. In this work, we propose a new constant memory scheme that reduces the sample complexity to (k/\epsilon^2)⋅polylog(1/\epsilon). We conjecture that this is optimal up to polylog(1/\epsilon) factors.Based on: https://arxiv.org/abs/2205.09804Joint work with: Andrew McGregor, Jelani Nelson, Erik Waingarten 32-G575

Organizer & Contact

Noah Golowich

nzg@csail.mit.edu

Part of

Algorithms and Complexity Seminar 2023

Statistical inference with computational constraints

Speaker

Host

April 26 2023

Location

Organizer & Contact

Part of

April 12

Bounded independence plus noise and derandomization

April 19

Credible Decentralized Exchange Design via Verifiable Sequencing Rules

Statistical inference with computational constraints

Speaker

Host

April 26 2023

Location

Organizer & Contact

Part of

Related Events

April 12

Bounded independence plus noise and derandomization

April 19

Credible Decentralized Exchange Design via Verifiable Sequencing Rules