October 24

Add to Calendar 2018-10-24 16:00:00 2018-10-24 17:00:00 America/New_York Log-Concave Polynomials and Matroids: Algorithms and Combinatorics Abstract: I will discuss an analytic property of multivariate polynomials, which we call complete log-concavity, and its surprising uses to attack problems in combinatorics, and algorithmic tasks such as sampling, counting, and inference on discrete distributions. This property defines a large class of discrete distributions that should be thought of as the discrete analog of the well-studied continuous log-concave distributions. Examples of distributions satisfying this property include uniform distributions over bases or independent sets of matroids, determinantal point processes and certain powers of them, the random cluster model and potts model for some regimes of parameters, and several other generalizations.I will discuss a recipe for verifying this property and then give an application in which we resolve a combinatorial conjecture of Mason on the ultra-log-concavity of the number of independent sets of varying sizes in matroids. Then I’ll discuss connections to random sampling.Based on joint work with Kuikui Liu, Shayan Oveis Gharan, and Cynthia Vinzant. 32-G575

October 17

Add to Calendar 2018-10-17 15:00:00 2018-10-17 16:00:00 America/New_York Online Learning, Probabilistic Inequalities, and the Burkholder Method Abstract: At first glance, online learning and martingale inequalities may not appear to be intrinically linked. We will showcase a recently discovered equivalence between existence of algorithms for online learning, martingale inequalities, and special "Burkholder" functions. Using this equivalence as a starting point, we define a notion of a sufficient statistic for online learning and use the Burkholder method---originally used to certify probabilistic martingale inequalities---to develop algorithms that only keep these sufficient statistics in memory. To demonstrate the power of the Burkholder method we introduce new efficient and adaptive algorithms for online learning, including an algorithm for matrix prediction that attains a regret bound corresponding to the variance term found in matrix concentration inequalities. 32-G575

October 03

Add to Calendar 2018-10-03 16:00:00 2018-10-03 17:00:00 America/New_York Certified Defenses against Adversarial Examples Abstract: While neural networks have achieved high accuracy on standard image classification benchmarks, their accuracy drops to nearly zero in the presence of small adversarial perturbations to test inputs. Defenses based on regularization and adversarial training have been proposed, but often followed by new, stronger attacks that defeat these defenses. Can we somehow end this arms race? In this talk, I will present some methods based on convex relaxations (with a focus on semidefinite programming) that output a certificate that for a given network and test input, no attack can force the error to exceed a certain value. I will then discuss how these certification procedures can be incorporated into neural network training to obtain provably robust networks. Finally, I will present some empirical results on the performance of attacks and different certificates on networks trained using different objectives. Joint work with Jacob Steinhardt and Percy Liang. 32-G575

September 12

Add to Calendar 2018-09-12 14:30:00 2018-09-12 15:30:00 America/New_York Hashing Algorithms for Extreme Scale Machine Learning Abstract: In this talk, I will discuss some of my recent and surprising findings on the use of hashing algorithms for large-scale estimations. Locality Sensitive Hashing (LSH) is a hugely popular algorithm for sub-linear near neighbor search. However, it turns out that fundamentally LSH is a constant time (amortized) adaptive sampler from which efficient near-neighbor search is one of the many possibilities. Our observation adds another feather in the cap for LSH. LSH offers a unique capability to do smart sampling and statistical estimations at the cost of few hash lookups. Our observation bridges data structures (probabilistic hash tables) with efficient unbiased statistical estimations. I will demonstrate how this dynamic and efficient sampling beak the computational barriers in adaptive estimations where, for the first time, it is possible that we pay roughly the cost of uniform sampling but get the benefits of adaptive sampling. We will demonstrate the power of one simple idea for three favorite problems 1) Partition function estimation for large NLP models such as word2vec, 2) Adaptive Gradient Estimations for efficient SGD and 3) Sub-Linear Deep Learning with Huge Parameter Space. In the end, if time permits, we will switch to memory cost show a simple hashing algorithm that can shrink memory requirements associated with classification problems exponentially! Using our algorithms, we can train 100,000 classes with 400,000 features, on a single Titan X while only needing 5% or less memory required to store all the weights. Running a simple logistic regression on this data, the model size of 320GB is unavoidable. 32-G575