June 05

Add to Calendar 2019-06-05 16:00:00 2019-06-05 17:00:00 America/New_York The Sample Complexity of Toeplitz Covariance Estimation Abstract: We study the query complexity of estimating the covariance matrix T of a distribution D over d-dimensional vectors, under the assumption that T is Toeplitz. This assumption is standard in a wide variety of signal processing problems, where the covariance between any two measurements only depends on the time or distance between those measurements. In many of these applications, we are interested inestimation strategies that may choose to view only a subset of entries in each d-dimensional sample x from D. We care about minimizing both 1) the number of samples taken and 2) the number of entries accessed in each sample, which often equates to minimizing equipment requirements in applications ranging from wireless transmission to advanced imaging.We give many of the first non-asymptotic bounds on these sample complexity measures. We analyze classical and widely used estimation algorithms, in particular methods based on selecting entries from each sample according to a `sparse ruler'. We explain how sparse ruler based estimation can significantly outperform naive methods when T is close to low-rank, as isoften the case in practice. We also develop a novel sampling and estimation strategy that improves on known algorithms in the low-rank case. Our method utilizes tools from random matrix sketching, leverage score based sampling techniques for continuous time signals, and sparse Fourier transform algorithms.Joint work with Y. Eldar, C. Musco, and C. Musco. 32-G575

May 29

Add to Calendar 2019-05-29 16:00:00 2019-05-29 17:00:00 America/New_York Tensor Programs: A Swiss-Army Knife for Nonlinear Random Matrix Theory of Deep Learning and Beyond Abstract: The resurgence of neural networks has revolutionized artificial intelligence since 2010. Luckily for mathematicians and statistical physicists, the study of large random network scaling limits, which can be thought of as *nonlinear* random matrix theory, is both practically important and mathematically interesting. We describe several problems in this setting and develop a new comprehensive framework, called “tensorprograms,” for solving these problems. This framework can be thought of as an automatic tool to derive the behavior of computation graphs with large matrices, as used in neural network computation. It is very general, and from it we also obtain new proofs of the semicircle and the Marchenko-Pastur laws. Thus, “tensor programs” is broadly useful to linearand nonlinear random matrix theory alike, and we hope it will be adopted as a standard tool.This talk presents the work arXiv:1902.04760. 32-G575

May 22

Add to Calendar 2019-05-22 16:00:00 2019-05-22 17:00:00 America/New_York Tinkering with Double-Greedy Submodular functions can be used to model a wide array of important problems from fields such as theoretical computer science, economics, and machine learning. One of the most basic problems in submodular optimization is to maximize a submodular function f. When the function is not necessarily monotone, this problem is known to be NP-hard to approximate better than a factor of 1/2 [Feige, Mirrokni, Vondrak '11; Dobzinski and Vondrak '12]. This lower bound was met by the landmark Double-Greedy algorithm of [Buchbinder, Feldman, Naor, Schwartz '12]. In this talk, I review this embarrassingly simple algorithm and then present two extensions to different settings. The first extension is to online no-regret learning, where we receive a submodular function each round and want to do well compared to the best fixed point in hindsight. The second extension concerns the weaker generalization of submodularity to the continuous setting, where we are not guaranteed that the function is coordinate-wise concave. Both extensions are powered by a game-theoretic view of the Double-Greedy algorithm.Joint work with Tim Roughgarden and Rad Niazadeh. Seminar Room G575

May 15

Add to Calendar 2019-05-15 15:00:00 2019-05-15 16:00:00 America/New_York Sampling Sketches for Concave Sublinear Functions of Frequencies We consider massive distributed datasets that consist of elements that are key-value pairs. Our goal is to compute estimates of statistics or aggregates over the data, where the contribution of each key is weighted by a function of its frequency (sum of values of its elements). This fundamental problem has a wealth of applications in data analytics and machine learning.A common approach is to maintain a sample of keys and estimate statistics from the sample. Ideally, to obtain low-variance estimates we sample keys with probabilities proportional to their contributions. One simple way to do so is to first aggregate the raw data to produce a table of keys and their frequencies, apply our function to the frequency values, and then apply a weighted sampling scheme. This aggregation however requires data structures of size proportional to the number of distinct keys and is too costly when the number is very large. Our main contribution is the design of composable sampling sketches that can be tailored to any concave sublinear function of the frequencies (including log, the moments x^p for 0Joint work with Edith Cohen. 32-G575

May 01

Add to Calendar 2019-05-01 16:00:00 2019-05-01 17:00:00 America/New_York Batch Normalization Causes Gradient Explosion in Deep Randomly Initialized Networks Abstract: Batch Normalization (batchnorm) has become a staple in deep learning since its introduction in 2015. The authors conjectured that “Batch Normalization may lead the layer Jacobians to have singular values close to 1” and recent works suggest it benefits optimization by smoothing the optimization landscape during training. We disprove the “Jacobian singular value” conjecture for randomly initialized networks, showing batchnorm causes gradient explosion that is exponential in depth. This implies that at initialization, batchnorm in fact “roughens” the optimization landscape. This explosion empirically prevents one from training relu networks with more than 50 layers without skip connection. We discuss several ways of mitigating this explosion and their relevance in practice. 32-G575

April 30

Add to Calendar 2019-04-30 16:00:00 2019-04-30 17:00:00 America/New_York Learning-Driven Algorithms for Discrete Optimization Abstract: This talk focuses on a novel fruitful synergy between machine learning and optimization --- in particular, how ML techniques can improve the design of algorithms for Discrete Optimization, both complete algorithms such as branch and bound as well as incomplete ones such as heuristic greedy search. Branch and Bound solvers for Mixed Integer Programs (MIP) such as CPLEX, Gurobi and SCIP are used daily across different domains and industries to find solutions with optimality guarantees for NP-hard combinatorial problems. Leveraging the plethora of rich and useful data generated during the solving process, we illustrate the potential for ML in MIP on two crucial tasks in branch and bound: branching variable selection and primal heuristic selection. Empirical results show that our novel approaches can significantly improve the performance of a solver on both heterogeneous benchmark instances as well as homogeneous families of instances. In the second part of the talk, we show how to leverage a unique combination of reinforcement learning and graph embedding to infer very effective data-driven greedy strategies for solving well-studied combinatorial optimization problems on graphs such as Minimum Vertex Cover, Max Cut and Traveling Salesman. I will conclude with  some new directions on 1) learning novel heuristics that apply to any MIP problem distributions, as well as 2) decision-focused learning. 32-G449 Patil/Kiva

April 26

Add to Calendar 2019-04-26 11:00:00 2019-04-26 12:00:00 America/New_York Adversarially Robust Property Preserving Hashes Abstract: Property-preserving hashing is a method of compressing a large input x into a short hash h(x) in such a way that given h(x) and h(y), one can compute a property P(x,y) of the original inputs. The idea of property-preserving hash functions underlies sketching, compressed sensing and locality-sensitive hashing.Property-preserving hash functions are usually probabilistic: they use the random choice of a hash function from a family to achieve compression, and as a consequence, err on some inputs. Traditionally, the notion of correctness for these hash functions requires that for every two inputs x and y, the probability that h(x) and h(y) mislead us into a wrong prediction of P(x,y) is negligible. As observed in many recent works (incl. Mironov, Naor and Segev, STOC 2008; Hardt and Woodruff, STOC 2013; Naor and Yogev, CRYPTO 2015), such a correctness guarantee assumes that the adversary (who produces the offending inputs) has no information about the hash function, and is too weak in many scenarios.We initiate the study of adversarial robustness for property-preserving hash functions, provide definitions, derive broad lower bounds due to a simple connection with communication complexity, and show the necessity of computational assumptions to construct such functions. Our main positive results are two candidate constructions of property-preserving hash functions (achieving different parameters) for the (promise) gap-Hamming property which checks if x and y are “too far” or “too close.” Our first construction relies on generic collision-resistant hash functions, and our second on a variant of the syndrome decoding assumption on low-density parity.Joint work with Elette Boyle and Vinod Vaikuntanathan. 32-D463 (Star)

April 17

March 06

Add to Calendar 2019-03-06 16:00:00 2019-03-06 17:00:00 America/New_York Decremental Strongly-Connected Components and Single-Source Reachability in Near-Linear Time Abstract: Computing the Strongly-Connected Components (SCCs) in a graph G=(V,E) is known to take only O(m + n) time using an algorithm by Tarjan from 1972[SICOMP 72] where m = |E|, n=|V|. For fully-dynamic graphs, conditional lower bounds provide evidence that the update time cannot be improved by polynomial factors over recomputing the SCCs from scratch after every update. Nevertheless, substantial progress has been made to find algorithms with fast update time for *decremental* graphs, i.e. graphs that undergo edge deletions.In this paper, we present the first algorithm for general decremental graphs that maintains the SCCs in total update time O(m polylog(n)), thus only a polylogarithmic factor from the optimal running time. Previously such a result was only known for the special case of planar graphs [Italiano et al, STOC 17]. Our result should be compared to the formerly best algorithm for general graphs achieving O(m\sqrt{n} polylog(n)) total update time by Chechik et.al. [FOCS 16] which improved upon a breakthrough result leading to O(mn^{0.9 + o(1)}) total update time by Henzinger, Krinninger and Nanongkai [STOC 14, ICALP 15]; these results in turn improved upon the longstanding bound of O(mn) by Roditty and Zwick [STOC 04]. 32-G575

February 27

Add to Calendar 2019-02-27 16:00:00 2019-02-27 17:00:00 America/New_York Opinion formation, stochastic gradient descent, and gradient-like systems Abstract: We study opinion dynamics on networks with two communities. Eachnode has one of two opinions and updates its opinion as a ``majority-like"function of the frequency of opinions among its neighbors. The networks weconsider are weighted graphs comprised of two equally sized communitieswhere intracommunity edges have weight $p$, and intercommunity edges haveweight $q$. Thus $q$ and $p$ parameterize the connectivity between the twocommunities. We prove a dichotomy theorem about the interaction of the two parameters:1) the ``majority-like" update function, and 2) the level of intercommunityconnectivity. For each set of parameters, we show that either: the systemquickly converges to consensus with high probability in time $\Theta(n\log(n))$; or, the system can get ``stuck" and take time $2^{\Theta(n)}$ toreach consensus. Technically, we achieve this the fast convergence result by exploiting theconnection between a family of reinforced random walks and dynamicalsystems literature. Our main result shows if the systems are a reinforcedrandom walk with a gradient-like function, it converges to an arbitraryneighborhood of a local attracting point in $O(n\log n)$ time with highprobability. This result adds to the recent literature on saddle-pointanalysis and shows a large family of stochastic gradient descent algorithmconverges to local minimal in $O(n\log n)$ when the step size $O(1/n)$. 32-G575

February 19

Add to Calendar 2019-02-19 15:45:00 2019-02-19 16:45:00 America/New_York Privately Learning High-Dimensional Distributions Abstract: We present novel, computationally efficient, and differentially private algorithms for two fundamental high-dimensional learning problems: learning a multivariate Gaussian in R^d and learning a product distribution in {0,1}^d in total variation distance. The sample complexity of our algorithms nearly matches the sample complexity of the optimal non-private learners for these tasks in a wide range of parameters. Thus, our results show that private comes essentially for free for these problems, providing a counterpoint to the many negative results showing that privacy is often costly in high dimensions. Our algorithms introduce a novel technical approach to reducing the sensitivity of the estimation procedure that we call recursive private preconditioning, which may find additional applications. Based on joint work with Jerry Li, Vikrant Singhal, and Jonathan Ullman. 32-D463 (Star)

February 07

Add to Calendar 2019-02-07 16:00:00 2019-02-07 17:00:00 America/New_York Jerry Li: Nearly Optimal Algorithms for Robust Mean Estimation Abstract: Robust mean estimation is the following basic estimation question: given samples from a distribution, where an \epsilon-fraction of them have been corrupted, how well can you estimate the mean of the distribution? This is a classical problem in statistics, going back to the 60's and 70's, and has recently found application to many problems in reliable machine learning. However, in high dimensions, classical algorithms for this problem either were (1) computationally intractable, or (2) lost poly(dimension) factors in their accuracy guarantees. Recently, polynomial time algorithms have been demonstrated for this problem that still achieve (nearly) optimal error guarantees. However, the runtimes of these algorithms still had additional polynomial factors which can render them ineffective in practice. In this talk we give the first truly nearly linear time algorithms for these problems that achieve nearly optimal statistical performance. The algorithms are surprisingly simple, and are based on directly instantiating the matrix multiplicative weights framework. Moreover, these algorithms apply very generally to a wide class of distributions. 32-G575

December 12

Add to Calendar 2018-12-12 16:00:00 2018-12-12 17:00:00 America/New_York Dean Doron: Near-Optimal Pseudorandom Generators for Constant-Depth Read-Once Formulas Abstract: We give an explicit nearly-optimal pseudorandom generator (PRG) for constant-depth read-once formulas. Previously, PRGs with near-optimal seed length were known only for the depth-2 case by Gopalan et al.. For a constant depth d > 2, the best prior PRG is a recent construction by Forbes and Kelley with seed length O(log^2 n) for the more general model of constant-width read-once branching programs with arbitrary variable order. Our construction follows Ajtai and Wigderson’s approach of iterated pseudorandom restrictions, and the main technical crux of our work is constructing a suitable pseudorandom restriction generator for constant-depth read-once formulas having a short seed.Joint work with Pooya Hatami and William Hoza. 32-G575

December 05

Add to Calendar 2018-12-05 16:00:00 2018-12-05 17:00:00 America/New_York Testing Linearity against Non-Signaling Strategies Abstract: Non-signaling strategies are collections of distributions with certain non-local correlations. In this talk, we discuss the problem of linearity testing (Blum, Luby, and Rubinfeld; JCSS 1993) against non-signaling strategies. We use Fourier analytic techniques to prove that any non-signaling strategy that passes the linearity test with high probability must be close to a quasi-distribution over linear functions. Quasi-distributions generalize the notion of probability distributions over functions by allowing negative probabilities, while at the same time requiring that “local views” follow standard distributions (with non-negative probabilities).Based on joint work with Alessandro Chiesa (UC Berkeley) and Igor Shinkar (Simon Fraser University). 32-G882

November 21

Add to Calendar 2018-11-21 16:00:00 2018-11-21 17:00:00 America/New_York On the Weight Distribution of Random Binary Linear Codes Abstract: A random (binary) linear code is a dimension lambda*n (0<\lambda<1) linear subspace of the binary n-dimensional hypercube, chosen uniformly from among all such subspaces. Such codes play an important role in the theory of error correcting codes, since they achieve the best known rate vs. distance trade-off, i.e., the Gilbert-Varshamov lower bound. Under a random errors regime, the problem of decoding these codes is known as Learning Parity with Noise, and has many cryptographic applications. This work is motivated by the contrast between the importance of random linear codes and how little we know about them.Much of the interesting information about a code C is captured by its weight distribution. This is the vector (w_0,w_1,...,w_n) where w_i counts the elements of C with Hamming weight i. In this work we study the weight distribution of random linear codes. In our main result we compute the moments of the random variable w_(gamma*n), where 0 < \gamma < 1 is a fixed constant and n goes to infinity.This is joint work with Nati Linial. 32-G449 (Kiva)

November 14

Add to Calendar 2018-11-14 15:00:00 2018-11-14 16:00:00 America/New_York Good Visualizations Have Short Sketches Abstract: We consider the problem of interactively visualizing a distributed tabular dataset with billions of rows. Our goal is to allow a user to browse such datasets in realtime, at any resolution, with just a few mouse clicks (much like navigating in an online map service) .We hypothesize that the sketching model of computation is tailor-made for this setting: interesting visualizations are amenable to efficient sketching protocols, whose communication cost is bounded by the size of the desired output (which is small, since it fits on screen). In this talk, we present Hillview, an open-source tool for interactive visualization of large datasets, built around this premise. We focus on the algorithmic challenges that arise in trying to render common visualizations in the sketching model.Based on joint work with Mihai Budiu, Udi Weider, Marcos Aguilera, Lalith Suresh and Han Kruiger (all from VMware research). 32-G575

November 09

Add to Calendar 2018-11-09 16:00:00 2018-11-09 17:00:00 America/New_York Simple and Efficient Algorithm for Parallel Matchings Abstract: For over a decade now we have been witnessing the success of a number of frameworks for massive parallel computation (MPC), such as MapReduce, Hadoop, Dryad, or Spark. Compared to the PRAM model, in these frameworks the number of machines is limited but each machine is allowed to perform unbounded (local) computation. A fundamental question that arise here is: can we leverage this additional power in terms of local computation to solve problems in fewer MPC than PRAM parallel rounds?In this context, we will consider the problem of computing approximate maximum matchings in the MPC model when the memory per machine is linear in the number of vertices. We will describe some of the approaches that led to MPC algorithms of O(log log n) round complexity -- vertex partitioning and round compression.First, we will describe how vertex-partitioning can be used to bypass some of the difficulties that edge-based partitioning approaches encounter. Second, we will present the technique of round compression in the context of a simple greedy algorithm for computing fractional matchings.This talk will be based on https://arxiv.org/pdf/1707.03478.pdf and https://arxiv.org/pdf/1802.08237.pdf. 32-G449

November 07

Add to Calendar 2018-11-07 16:00:00 2018-11-07 17:00:00 America/New_York How to Use Heuristics for Differential Privacy Abstract: In this paper, we develop theory for using heuristics to solve computationally hard problems in differential privacy. Heuristic approaches have enjoyed tremendous success in machine learning, in which performance can be empirically evaluated. However, privacy guarantees cannot be evaluated empirically, and must be proven — without making heuristic assumptions. We show that learning problems over broad classes of functions — those that have universal identification sequences — can be solved privately, assuming the existence of a non-private oracle for solving the same problem. Our generic algorithm yields a privacy guarantee that only holds if the oracle succeeds. We then give a reduction which applies to a class of heuristics, which we call certifiable, which allows us to give a worst-case privacy guarantee that holds even when the oracle might fail in adversarial ways. Finally, we consider classes of functions for which both they and their dual classes have universal identification sequences. This includes most classes of simple boolean functions studied in the PAC learning literature, including halfspaces, conjunctions, disjunctions, and parities. We show that there is an efficient algorithm for privately constructing synthetic data for any such class, given a non-private learning oracle. 32-G882

October 31

Write-Efficient Algorithms

Yan Gu
Carnegie Mellon University (CMU)
Add to Calendar 2018-10-31 16:00:00 2018-10-31 17:00:00 America/New_York Write-Efficient Algorithms Abstract: The future of main memory appears to lie in the direction of new non-volatile memory technologies that provide strong capacity-to-performance ratios, but have write operations that are much more expensive than reads regarding energy, bandwidth, and latency. Such property of asymmetry in read and write costs poses the desire of “write-efficient algorithms” that use much fewer writes compared to the classic approaches.This talk introduces the computational models we used to capture such asymmetry in algorithm design, and then briefly reviews existing results of the lower bounds on the asymmetric models, as well as a list of new write-efficient algorithms. As an example of designing write-efficient algorithms, a new parallel algorithm for planar Delaunay triangulation will be introduced, which achieves optimal numbers of writes and arithmetic operations, as well as a poly-logarithmic parallel depth. Finally, a list of open problems will be discussed that can be interesting for potential future work. 32-G575

October 26

Add to Calendar 2018-10-26 16:00:00 2018-10-26 17:00:00 America/New_York Limits on All Known (and Some Unknown) Approaches to Matrix Multiplication Abstract: We study the known techniques for designing Matrix Multiplication (MM) algorithms. The two main approaches are the Laser method of Strassen, and the Group theoretic approach of Cohn and Umans. We define a generalization based on zeroing outs which subsumes these two approaches, which we call the Solar method, and an even more general method based on monomial degenerations, which we call the Galactic method.We then design a suite of techniques for proving lower bounds on the value of omega, the exponent of MM, which can be achieved by algorithms using many tensors T and the Galactic method. Our main result is that there is a universal constant c > 2 such that a large class of tensors generalizing the Coppersmith-Winograd tensor CW_q cannot be used within the Galactic method to show a bound on omega better than c, for any q.In this talk, I'll begin by giving a high-level overview of the algorithmic techniques involved in the best known algorithms for MM, and then I'll tell you about our lower bounds. No prior knowledge of MM algorithms will be assumed.Joint work with Virginia Vassilevska Williams which appeared in FOCS 2018. 32-G882