Today's leading AI systems have achieved one of the field's oldest dreams and promises: They can take in any language as input and produce reasonable responses – often very much like the responses that a reasonable (and knowledgeable and helpful) person would produce. Yet the processes at work inside these systems, and how they are built, do not (at least obviously) have much in common with the mechanisms or origins of the human mind. What would it take to build a model with something like the input-output behavior of ChatGPT but whose inner workings actually instantiated a theory of human cognition – and even our best current scientific theory? I will discuss several possible routes to this goal, and the challenges and opportunities they present. I will argue that now more than ever is the time for a bidirectional exchange between the fields of AI and Cog Sci – fields that grew up together starting in the 1950s, but have followed very different trajectories recently. AI tools and techniques have much to offer cognitive theories, but cognitive science has just as much if not more to offer back to AI. Understanding and using AI tools in a framework guided by foundational thinking in cognitive science represents the best hope to deliver on the theoretical goals, dreams, and promises of both fields.
For decades, programming was the way through which we told machines what to do, but modern AI techniques promise new ways of creating software directly from data and natural language. But programming has a number of advantages that have enabled us to build reliable large scale computing infrastructure. In this presentation, I explain some new approaches to learn from data while preserving some of the benefits of programming, and some of their applications in domains ranging from robotics to computational biology.
Manish Raghavan is the Drew Houston (2005) Career Development Professor at the MIT Sloan School of Management and Department of Electrical Engineering and Computer Science. Before that, he was a postdoctoral fellow at the Harvard Center for Research on Computation and Society (CRCS). His research centers on the societal impacts of algorithms and AI.
Transformers are the dominant architecture for language modeling (and generative AI more broadly). The attention mechanism in Transformers is considered core to the architecture and enables accurate sequence modeling at scale. However, the complexity of attention is quadratic in input length, which makes it difficult to apply Transformers to model long sequences. Moreover, Transformers have theoretical limitations when it comes to the class of problems it can solve, which prevents their being able to model certain kinds of phenomena such as state tracking. This talk will describe some recent work on efficient alternatives to Transformers which can overcome these limitations.
I will argue that representations in different deep nets are converging. First, I will survey examples of convergence in the literature: over time and across multiple domains, the ways by which different neural networks represent data are becoming more aligned. Next, I will demonstrate convergence across data modalities: as vision models and language models get larger, they measure distance between datapoints in a more and more alike way. I will hypothesize that this convergence is driving toward a shared statistical model of reality, akin to Plato's concept of an ideal reality. We term such a representation the platonic representation and discuss several possible selective pressures toward it. Finally, I'll discuss the implications of these trends, their limitations, and counterexamples to our analysis.