Optimal Sample Complexity of Contrastive Learning

Speaker

Northwestern Uniersity

Host

Noah Golowich

MIT

Abstract: Contrastive learning is a highly successful technique for learning representations of data from labeled tuples, specifying the distance relations within the tuple. We study the sample complexity of contrastive learning, i.e. the minimum number of labeled tuples sufficient for getting high generalization accuracy. We give tight bounds on the sample complexity in a variety of settings, focusing on arbitrary distance functions, both general ℓp-distances, and tree metrics. Our main result is an (almost) optimal bound on the sample complexity of learning ℓp-distances for integer p. For any p≥1 we show that Θ̃ (min(nd,n2)) labeled tuples are necessary and sufficient for learning d-dimensional representations of n-point datasets. Our results hold for an arbitrary distribution of the input samples and are based on giving the corresponding bounds on the Vapnik-Chervonenkis/Natarajan dimension of the associated problems. We further show that the theoretical bounds on sample complexity obtained via VC/Natarajan dimension can have strong predictive power for experimental results, in contrast with the folklore belief about a substantial gap between the statistical learning theory and the practice of deep learning.

Add to Calendar 2024-04-18 16:00:00 2024-04-18 17:00:00 America/New_York Optimal Sample Complexity of Contrastive Learning Abstract: Contrastive learning is a highly successful technique for learning representations of data from labeled tuples, specifying the distance relations within the tuple. We study the sample complexity of contrastive learning, i.e. the minimum number of labeled tuples sufficient for getting high generalization accuracy. We give tight bounds on the sample complexity in a variety of settings, focusing on arbitrary distance functions, both general ℓp-distances, and tree metrics. Our main result is an (almost) optimal bound on the sample complexity of learning ℓp-distances for integer p. For any p≥1 we show that Θ̃ (min(nd,n2)) labeled tuples are necessary and sufficient for learning d-dimensional representations of n-point datasets. Our results hold for an arbitrary distribution of the input samples and are based on giving the corresponding bounds on the Vapnik-Chervonenkis/Natarajan dimension of the associated problems. We further show that the theoretical bounds on sample complexity obtained via VC/Natarajan dimension can have strong predictive power for experimental results, in contrast with the folklore belief about a substantial gap between the statistical learning theory and the practice of deep learning. 32-D507

Organizer & Contact

Noah Golowich

nzg@csail.mit.edu

Part of

Algorithms and Complexity (A&C) 2024 - 2025

Optimal Sample Complexity of Contrastive Learning

Speaker

Host

April 18 2024

Location

Organizer & Contact

Part of

May 14

Catalytic Computing: A Primer

May 01

Understanding the Trade-Offs Between Hallucinations and Mode Collapse in Language Generation

Optimal Sample Complexity of Contrastive Learning

Speaker

Host

April 18 2024

Location

Organizer & Contact

Part of

Related Events

May 14

Catalytic Computing: A Primer

May 01

Understanding the Trade-Offs Between Hallucinations and Mode Collapse in Language Generation