September 18

Add to Calendar 2019-09-18 14:00:00 2019-09-18 15:30:00 America/New_York Yoshua Bengio: Learning High-Level Representations for Agents Abstract: A dream of the deep learning project was that a learner could discover a hierarchy of representations with the highest level capturing abstract concepts of the kind we can communicate with language, reason with and generally use to understand how the world works. It is still a challenge but recent progress in machine learning could help us approach that objective. We will discuss how the ability to discover causal structure, and in particular causal variables (from low-level perception), would be a progress in that direction, and how recent advances in meta-learning and taking the perspective of an agent (rather than a passive learner) could also play in important role. Because we are talking about high-level variables, this discussion touches on the old divide between system 1 cognition (intuitive and anchored in perception) and system 2 cognition (conscious and more sequential): these high-level variables sit at the interface between the two types of cognitive computations. Unlike what some advocate when they talk about disentangling factors of variation, I do not believe that these high-level variables should be considered to be independent of each other in a statistical sense. They might be independent in a different sense, in the sense that we can independently modify some rather than others, and in fact they are connected to each other through a rich web of dependencies of the kind we communicate with language. The agent and meta-learning perspective also force us to leave the safe ground of iid data of current learning theory and start thinking about non-stationarity, which a learning agent is necessarily confronted with. Instead of viewing such non-stationarity as a hurdle, we propose to view it as a source of information because these changes are often due to interventions by agents (the learner or other agents), and can thus help a learner figure out causal structure. In return, we might be able to build learning systems which are much more robust to changes in the environment, because they capture what is stationary and stable in the long run throughout these nonstationarities, and they build models of the world which can quickly adapt to such changes and sometimes may even be able to correctly infer what caused those changes (thus requiring no additional examples to make sense of the change in distribution).Bio: Recognized as one of the world’s leading experts in artificial intelligence (AI), Yoshua Bengio is a pioneer in deep learning. He began his education in Montreal, where he earned his Ph.D. in computer science from McGill University, then completed his postdoctoral studies at the Massachusetts Institute of Technology (MIT). Since 1993, he has been a professor in the Department of Computer Science and Operational Research at the Université de Montréal. In 2000, he became the holder of the Canada Research Chair in Statistical Learning Algorithms. At the same time, he founded and became scientific director of Mila, the Quebec Institute of Artificial Intelligence, which is the world’s largest university-based research group in deep learning. Lastly, he is also the Scientific Director of IVADO. His research contributions have been undeniable. In 2018, Yoshua Bengio collected the largest number of new citations in the world for a computer scientist thanks to his three reference works and some 500 publications. Professor Bengio aspires to discover the principles that lead to intelligence through learning, and his research has earned him multiple awards. In 2019, he earned the prestigious Killam Prize in computer science from the Canada Council for the Arts and was co-winner of the A.M. Turing Prize, considered the “Nobel of computer science,” which he received jointly with Geoffrey Hinton and Yann LeCun. He is also an Officer of the Order of Canada, a Fellow of the Royal Society of Canada, a recipient of the Excellence Awards of the Fonds de recherche du Québec – Nature et technologies 2019 and the Marie-Victorin prize and was named Scientist of the Year by Radio-Canada in 2017. These honours reflect the profound influence of his work on the evolution of our society.Concerned about the social impact of AI, he has actively contributed to the development of the Montreal Declaration for the Responsible Development of Artificial Intelligence. Grier 34-401 Belfer

October 16

Add to Calendar 2019-10-16 16:30:00 2019-10-16 17:30:00 America/New_York David Patterson: Domain Specific Architectures for Deep Neural Networks: Three Generations of Tensor Processing Units (TPUs) Abstract:The recent success of deep neural networks (DNN) has inspired a resurgence in domain specific architectures (DSAs) to run them, partially as a result of the deceleration of microprocessor performance improvement due to the ending of Moore’s Law. DNNs have two phases: training, which constructs accurate models, and inference, which serves those models. Google’s first generation Tensor Processing Unit (TPUv1) offered 50X improvement in performance per watt over conventional architectures for inference. We naturally asked whether a successor could do the same for training. This talk reviews TPUv1 and explores how Google built the first production DSA supercomputer for the much harder problem of training, which was deployed in 2017. Google’s TPUv2/TPUv3 supercomputers with up to 1024 chips train production DNNs at close to perfect linear speedup with 10X-40X higher floating point operations per Watt than general-purpose supercomputers running the high-performance computing benchmark Linpack.Bio:David Patterson is a Berkeley CS professor emeritus, a Google distinguished engineer, and the RISC-V Foundation Vice-Chair. He received his BA, MS, and PhD degrees from UCLA. His Reduced Instruction Set Computer (RISC), Redundant Array of Inexpensive Disks (RAID), and Network of Workstation projects helped lead to multibillion-dollar industries. This work led to 40 awards for research, teaching, and service plus many papers and seven books. The best known book is ‘Computer Architecture: A Quantitative Approach,’ and the newest is ‘The RISC-V Reader: An Open Architecture Atlas.’ In 2018 he and John Hennessy shared the ACM A.M. Turing Award. 32-123 Belfer