Learning and Incentives in Human–AI Collaboration

Speaker

UPenn

Host

Justin Chen, Lily Chung, John Kuszmaul
CSAIL, EECS

As AI systems become more capable, a central challenge is designing them to work effectively with humans. Game theory and online learning provide a natural toolkit for analyzing such interactions, where both humans and algorithms adapt strategically over time.

Consider a doctor who wants to diagnose a patient and can consult an AI. Suppose, to start, that the AI is aligned with the doctor’s goal of accurate diagnosis. Even under this cooperative assumption, the doctor and AI may each hold only partial and incomparable knowledge, and they may not have a shared prior. How can we guarantee that their combined predictions are strictly better than relying on either alone, without assuming shared priors or stochastic data? Importantly, they may face many such cases together over time, and this repeated interaction enables richer forms of collaboration. I will present learning-theoretic results showing that collaboration is possible in a fully distribution-free repeated prediction setting, with protocols that ensure regret bounds against benchmarks defined on the joint information of both the human and the AI.

In the second part, I will return to the alignment assumption itself. What if the AI is not necessarily aligned with the doctor’s goals? For example, what if the AI model is developed by a pharmaceutical company that has a preference for the doctor prescribing their own drugs? To address such misaligned incentives, it is natural to consider a setting where the doctor has access to multiple AI models, each offered by a different provider. Although each provider may be misaligned, under a mild average alignment assumption—that the doctor’s utility lies in the convex hull of the providers’ utilities—we show that in Nash equilibrium of the competition among providers, the doctor can achieve the same outcomes they would if a perfectly aligned provider were present. The analysis builds on ideas from Bayesian persuasion and information design, adapted to settings with competing AI providers.

This talk is based on the following works: Tractable Agreement Protocols (STOC’25), Collaborative Prediction: Tractable Information Aggregation via Agreement (in submission), and Emergent Alignment via Competition (in submission).