May 13

May 06

Add to Calendar 2025-05-06 16:15:00 2025-05-06 17:15:00 America/New_York Deciding high-dimensional sub-Gaussian-ness in polynomial time Given samples from a probability distribution, can efficient algorithms tell whether the distribution has heavy or light tails? This problem is at the core of algorithmic statistics, where algorithms for deciding heavy-versus-light tailed-ness are key subroutines for clustering, learning in the presence of adversarial outliers, and more. It is easy in one dimension but challenging in high dimensions, where a distribution can have light tails in some directions and heavy ones in others -- detecting a single direction with a heavy tail hiding in an otherwise light-tailed distribution can seemingly require brute-force search. In this talk, I describe a family of efficient algorithms for deciding whether a high-dimensional probability distribution has sub-Gaussian tails, with applications to a wide range of high-dimensional learning tasks using sub-Gaussian data.Based on joint work with Ilias Diakonikolas, Ankit Pensia, and Stefan Tiegel. TBD

April 22

Add to Calendar 2025-04-22 16:15:00 2025-04-22 17:15:00 America/New_York How to Securely Implement Cryptography in Deep Neural Networks The wide adoption of deep neural networks (DNNs) raises the question of how can we equip them with a desired cryptographic functionality (e.g., to decrypt an encrypted input, to verify that this input is authorized, or to hide a secure watermark in the output).The problem is that cryptographic primitives are typically designed to run on digital computers that use Boolean gates to map sequences of bits to sequences of bits, whereas DNNs are a special type of analog computer that uses linear mappings and ReLUs to map vectors of real numbers to vectors of real numbers. In the past, this discrepancy between the discrete and continuous computational models had led to many interesting side channel attacks. In this talk I will describe a new theory of security when digital cryptographic primitives are implemented as ReLU-based DNNs. I will first demonstrate the existence of a provable exponential gap between the complexities of solving a simple search problem in the two computational models. I will then show that the natural implementations of block ciphers as DNNs can be broken in linear time by using nonstandard inputs whose “bits” are real numbers. Finally, I will develop a new and completely practical method for implementing any desired cryptographic functionality as a standard ReLU-based DNN in a provably secure and correct way. TBD

April 15

Add to Calendar 2025-04-15 16:15:00 2025-04-15 17:15:00 America/New_York Learning Multi-Index Models Multi-index models (MIMs) are functions that depend on the projection of the input onto a low-dimensional subspace. These models offer a powerful framework for studying various machine learning tasks, including multiclass linear classification, learning intersections of halfspaces, and more complex neural networks. Despite extensive investigation, there remains a vast gap in our understanding of the efficient learnability of MIMs.In this talk, we will survey recent algorithmic developments on learning MIMs, focusing on methods with provable performance guarantees. In particular, we will present a robust learning algorithm for a broad class of well-behaved MIMs under the Gaussian distribution. A key feature of our algorithm is that its running time has fixed-degree polynomial dependence on the input dimension. We will also demonstrate how this framework leads to more efficient and noise-tolerant learners for multiclass linear classifiers and intersections of halfspaces.Time permitting, we will highlight some of the many open problems in this area.The main part of the talk is based on joint work with G. Iakovidis, D. Kane, and N. Zarifis.   TBD

April 08

Add to Calendar 2025-04-08 16:15:00 2025-04-08 17:15:00 America/New_York Simulating Time With Square-Root Space We show that for all functions t(n) ≥ n, every multitape Turing machine running in time t can be simulated in space only O(√(t log t)). This is a substantial improvement over Hopcroft, Paul, and Valiant's simulation of time t in O(t / log t) space from 50 years ago [FOCS 1975, JACM 1977]. Among other results, our simulation implies that bounded fan-in circuits of size s can be evaluated on any input in only √s · poly(log s) space, and that there are explicit problems solvable in O(n) space which require at least n^(2−ε) time on multitape Turing machines for all ε > 0, thereby making a little progress on the P vs PSPACE problem.Our simulation reduces the problem of simulating time-bounded multitape Turing machines to a series of implicitly-defined Tree Evaluation instances with nice parameters, leveraging the remarkable space-efficient algorithm for Tree Evaluation recently found by Cook and Mertz [STOC 2024]. TBD

March 18

Add to Calendar 2025-03-18 16:15:00 2025-03-18 17:15:00 America/New_York Rapid Mixing at the Uniqueness Threshold Over the past decades, a fascinating computational phase transition has been identified in sampling from Gibbs distributions. However, the computational complexity at the critical point remains poorly understood, as previous algorithmic and hardness results all required a constant slack from this threshold. In this talk, I will introduce our recent result on resolving this open question at the critical threshold, thus completing the picture of the computational phase transition. Specifically, we show that for the critical hardcore model, the mixing time of Glauber dynamics is always polynomial and in the worst case super-linear in the number of vertices. Similar results are established in the critical Ising model. Based on joint work with Xiaoyu Chen, Yitong Yin, and Xinyuan Zhang. TBD

March 11

Add to Calendar 2025-03-11 16:15:00 2025-03-11 17:15:00 America/New_York Overparametrized systems: from Smale's 17th problem to two-layer neural networks Training modern machine learning models requires to optimize highly non-convex risk function and yet simple gradient-based methods are able to find global minima for very high-dimensional problems. Randomness and overparametrization are likely to have something to do with this magic.In the 90s, Stephen Smale conjectured that a system of n random polynomial equations in n complex unknowns can be solved in average-case polynomial time.I will present recent progress on the real-variables version of this problem and characterize the overparametrization that is required in that case to make the problem efficiently solvable.I will then show that closely related ideas can be used to analyze two-layer neural networks. In that case, reaching a global minimum (in absence of regularization) can be problematic from a statistical viewpoint, leading to overfitting. [Based on joint papers with Eliran Subag and with Pierfrancesco Urbani] TBD

March 04

Add to Calendar 2025-03-04 16:15:00 2025-03-04 17:15:00 America/New_York Pseudorandom Correlation Generators Correlated secret randomness is an important resource for many cryptographic applications. We initiate the study of pseudorandom correlation generators (PCGs), an analog of pseudorandom generators to the case of multi-party correlations, which offer the ability to securely generate large sources of correlated randomness from short correlated seeds using only local computation. In this talk, we survey the state of the art in PCGs, including constructions and usage within secure computation. TBD

February 25

Add to Calendar 2025-02-25 16:15:00 2025-02-25 17:15:00 America/New_York Good Locally Testable Codes An error-correcting code is locally testable (LTC) if there is a random tester that reads only a small number of bits of a given word and decides whether the word is in the code, or at least close to it.A long-standing problem asks if there exists such a code that also satisfies the golden standards of coding theory: constant rate and constant distance. Unlike the classical situation in coding theory, random codes are not LTC, so this problem is a challenge of a new kind.We construct such codes based on what we call (Ramanujan) Left/Right Cayley square complexes. These objects seem to be of independent group-theoretic interest. The codes built on them are 2-dimensional versions of the expander codes constructed by Sipser and Spielman (1996).The main result and lecture will be self-contained. But we hope also to explain how the seminal work of Howard Garland (1972) on the cohomology of quotients of the Bruhat-Tits buildings of p-adic Lie group has led to this construction (even though it is not used at the end).Based on joint work with I. Dinur, S. Evra, R. Livne, and S. Mozes. TBD

February 18

Add to Calendar 2025-02-18 16:15:00 2025-02-18 17:15:00 America/New_York On the Complexity of Neural Computation in Superposition Recent advances in neural networks interpretability suggest that superposition, the ability of a network to represent many more features than it has neurons, is a key mechanism underlying how neural networks compute. This work explores the theoretical foundations of computing in superposition, focusing on explicit, provably correct algorithms and their efficiency.We introduce a lower bound technique that provides the first general lower bounds for neural network size.  Using this technique, we show that for a broad class of problems, including permutations and pairwise logical operations, a neural network requires at least Omega(sqrt{m' log m'}) neurons and Omega(m' log m') parameters, where m' is the number of output features being computed. This establishes a limit to how much superposition a neural network can exhibit.  Conversely, we prove a nearly matching upper bound: fundamental logical operations, such as computing a set of m’ pairwise AND operations in parallel, can be computed using O(sqrt{m'} log m') neurons and O(m' log^2 m') parameters.These results have several implications.  They demonstrate an exponential gap between computing in superposition (the focus of this work) and merely representing the features in superposition, which requires only O(log m') neurons by the Johnson-Lindenstrauss Lemma.  Moreover, our lower bound implies that any of the methods used in practice to compress neural networks must produce a model with at least Omega(m' log m') parameters, regardless of the initial dense network size.  We hope that our results open a path for applying complexity-theoretic techniques to future research on neural network interpretability.Joint work with Nir Shavit. TBD

February 11

Add to Calendar 2025-02-11 16:15:00 2025-02-11 17:15:00 America/New_York "Local-to-Global" Theorems on High Dimensional Expanders Expansion in graphs is a well studied topic, with a wealth of applications in many areas of mathematics and the theory of computation.High dimensional expansion is a generalization of expansion from graphs to higher dimensional objects, such as simplicial complexes or partially ordered sets.High dimensional expanders are still far from understood, but one fascinating new aspect is how global properties emerge from local ones. This phenomenon has already led to progress on longstanding questions in the areas of error-correcting codes, and Probabilistically Checkable Proofs (PCPs).I will describe the two key definitions: cosystolic expansion and link expansion. I will then describe how these relate to some of the exciting new applications. TBD

December 10

Add to Calendar 2024-12-10 16:15:00 2024-12-10 17:15:00 America/New_York Coboundary Expansion Inside Chevalley Coset Complex HDXs Refreshments at 4:00 PMAbstract:Recent major results in property testing and PCPs were unlocked by moving to high-dimensional expanders (HDXs) constructed from C_d-type buildings, rather than the long-known A_d-type ones. At the same time, these building quotient HDXs are not as easy to understand as the more elementary (and more symmetric/explicit) coset complex HDXs constructed by Kaufman-Oppenheim [KO18] (of A-d-type) and O’Donnell-Pratt [OP22] (of Bd-, Cd-, Dd-type). Motivated by these considerations, we study the B_3-type generalization of a recent work of Kaufman-Oppenheim [KO21], which showed that the A_3-type coset complex HDXs have good 1-coboundary expansion in their links, and thus yield 2-dimensional topological expanders. The crux of Kaufman-Oppenheim’s proof of 1-coboundary expansion was: (1) identifying a group-theoretic result by Biss and Dasgupta on small presentations for the A_3-unipotent group over F_q; (2) "lifting" it to an analogous result for an A_3-unipotent group over polynomial extensions F_q[x]. For our B_3-type generalization, the analogue of (1) appears to not hold. We manage to circumvent this with a significantly more involved strategy: (1) getting a computer-assisted proof of vanishing 1-cohomology of B_3-type unipotent groups over F_5; (2) developing significant new "lifting" technology to deduce the required quantitative 1-cohomology results in B_3-type unipotent groups over F_{5^k}[x].Joint work with Noah G. Singer (Carnegie Mellon). 32-G449 (Kiva/Patil)

December 03

Add to Calendar 2024-12-03 16:15:00 2024-12-03 17:15:00 America/New_York A New Approach to Optimal Spectral Gaps Refreshments at 4:00 PMAbstract:It was conjectured by Alon in the 1980s that random d-regular graphs have the largest possible spectral gap (up to negligible error) among all d-regular graphs. This conjecture was proved by Friedman in 2004 in major tour de force. In recent years, deep generalizations of Friedman's theorem, such as strong convergence of random permutation matrices due to Bordenave and Collins, have played a central role in a series of breakthrough results on random graphs, geometry, and operator algebras.In joint work with Chen, Garza-Vargas, and Tropp, we recently discovered a surprisingly simple new approach to such results that is almost entirely based on soft arguments. This approach makes it possible to address previously inaccessible questions: for example, it enables a sharp understanding of the large deviation probabilities in Friedman's theorem, and establishes optimal spectral gaps of random Schreier graphs that require many fewer random bits than ordinary random regular graphs. I will aim to explain some of these results and some intuition behind the proofs. 32-124

November 26

Add to Calendar 2024-11-26 16:15:00 2024-11-26 17:15:00 America/New_York Recent Advances in Differential Privacy under Continual Observation Refreshments at 4:00 PMAbstract:Differential privacy is one of the most popular definitions of privacy, being used both in research as well as in industrial applications. The multi-query setting, where many queries have to be answered over a static data set, is already well understood for many algorithmic questions. The dynamic setting, where updates to the data set are interleaved with queries over the current data set, was introduced in 2010 by Dwork, Naor, Pitassi, and Rothblum, who called it differential privacy under continual observation. While there was not much work on that model in the 2010s, it has received significant attention in recent years, partially due to its use in private machine learning. I will survey the state-of-the-art in the continual observation setting and explain its motivation as well as the main algorithmic techniques that have led to the recent advances. 32-G449

November 19

Add to Calendar 2024-11-19 16:15:00 2024-11-19 17:15:00 America/New_York Vizing’s Theorem in Near-Linear Time Refreshments at 4:00 PMAbstract:In his classic result, Vizing (1964) proved that any graph of maximum degree ∆ can be edge colored using at most ∆+1 different colors. Vizing’s original proof is algorithmic and shows that such an edge coloring can be found in O(mn) time where m and n denote the number of edges and vertices respectively. This was subsequently improved to O~(m \sqrt{n}) independently by [Arjomandi ’82; Gabow et al. ’85]. This bound remained state-of-the-art for four decades, and only recently got improved to O~(min{n^2, mn^{1/4}}) [Assadi ’24; Bhattacharya et al. ’24].In this talk, I will present a completely new approach to edge coloring that leads to the first near-linear—in fact O(m log ∆)—time algorithm for Vizing’s theorem.Based on a recent joint work with Sepehr Assadi, Sayan Bhattacharya, Martın Costa, Shay Solomon, and Tianyi Zhang. 32-G449

October 08

Add to Calendar 2024-10-08 16:15:00 2024-10-08 17:15:00 America/New_York Learning to Defer in Content Moderation: The Human-AI Interplay Refreshments at 4:00 PMAbstract: Ensuring successful content moderation is vital for a healthy online social platform where it is necessary to responsively remove harmful posts without jeopardizing non-harmful content. Due to the high-volume nature of online posts, human-only moderation is operationally challenging, and platforms often employ a human-AI collaboration approach. A typical heuristic estimates the expected harmfulness of incoming posts and uses fixed thresholds to decide whether to remove the post and whether to send it for human review. This disregards the uncertainty in the machine learning estimation, the time-varying element of human review capacity and post arrivals, and the selective sampling in the dataset (humans only review posts filtered by the admission algorithm). We introduce a model to capture this human-AI interplay. Our algorithm observes contextual information for posts, makes classification and admission decisions, and schedules posts for human review. Only admitted posts receive human reviews on their harmfulness. These reviews help educate the machine-learning algorithms but are delayed due to congestion in the human review system. We propose a near-optimal learning algorithm that balances the classification loss from a selectively sampled dataset, the idiosyncratic loss of non-reviewed posts, and the delay loss of having congestion in the human review system. To the best of our knowledge, this is the first result for online learning in contextual queueing systems and hence our analytical framework may be of independent interest. This talk is based on joint work with Wentao Weng (Ph.D. student at MIT EECS). A preprint of the corresponding paper can be found here: https://arxiv.org/pdf/2402.12237. This work has been selected as a finalist in the 2024 INFORMS Junior Faculty Interest Group (JFIG) Paper Competition. 32-G449

September 24

Add to Calendar 2024-09-24 16:15:00 2024-09-24 17:15:00 America/New_York Near Optimal Alphabet-Soundness Tradeoff PCPs Refreshments at 4:00 PMAbstract: We show a nearly optimal alphabet-soundness tradeoff for NP-hardness of 2-Prover-1-Round Games (2P1R). More specifically, we show that for all \eps > 0, for sufficiently large M, it is NP-hard to decide whether a 2P1R instance of alphabet size M has value nearly 1 or at most M^{-1+\eps}. 2P1R are equivalent to 2-Query PCP, and are widely used in obtaining hardness of approximation results. As such, our result implies the following: 1) hardness of approximating Quadratic Programming within a factor of nearly log(n), 2) hardness of approximating d-bounded degree 2-CSP within a factor of nearly d/2, and 3) improved hardness of approximation results for various k-vertex connectivity problems. For the first two applications, our results nearly match the performance of the best known algorithms.Joint work with Dor Minzer. 32-G449

September 17

Add to Calendar 2024-09-17 16:15:00 2024-09-17 17:15:00 America/New_York Learning in Strategic Environments: from Calibrated Agents to General Information Asymmetry Abstract: In this talk I will discuss learning in principal-agent games where there is information asymmetry between what the principal and what the agent know about each other’s chosen actions. I will introduce a generalization of the standard Stackelberg Games (SGs) framework: Calibrated Stackelberg Games (CSGs). In CSGs, a principal repeatedly interacts with an agent who (contrary to standard SGs) does not have direct access to the principal’s action but instead best-responds to calibrated forecasts about it. I will show that in CSGs, the principal can achieve utility that converges to the optimum Stackelberg value of the game (i.e., the value that they could achieve had the agent known the principal’s strategy all along) both in finite and continuous settings, and that no higher utility is achievable. Finally, I will discuss a meta-question: when learning in strategic environments, can agents overcome uncertainty about their preferences to achieve outcomes they could have achieved absent any uncertainty? And can they do this solely through interactions with each other?Based on joint works with Nivasini Ananthakrishnan (UC Berkeley), Nika Haghtalab (UC Berkeley), and Kunhe Yang (UC Berkeley). 32-G449