Learning to Defer in Content Moderation: The Human-AI Interplay

Speaker

Thodoris Lykouris
MIT Sloan

Host

Sam Hopkins and Kuikui Liu
CSAIL
Refreshments at 4:00 PM

Abstract:
Ensuring successful content moderation is vital for a healthy online social platform where it is necessary to responsively remove harmful posts without jeopardizing non-harmful content. Due to the high-volume nature of online posts, human-only moderation is operationally challenging, and platforms often employ a human-AI collaboration approach. A typical heuristic estimates the expected harmfulness of incoming posts and uses fixed thresholds to decide whether to remove the post and whether to send it for human review. This disregards the uncertainty in the machine learning estimation, the time-varying element of human review capacity and post arrivals, and the selective sampling in the dataset (humans only review posts filtered by the admission algorithm). We introduce a model to capture this human-AI interplay. Our algorithm observes contextual information for posts, makes classification and admission decisions, and schedules posts for human review. Only admitted posts receive human reviews on their harmfulness. These reviews help educate the machine-learning algorithms but are delayed due to congestion in the human review system. We propose a near-optimal learning algorithm that balances the classification loss from a selectively sampled dataset, the idiosyncratic loss of non-reviewed posts, and the delay loss of having congestion in the human review system. To the best of our knowledge, this is the first result for online learning in contextual queueing systems and hence our analytical framework may be of independent interest.

This talk is based on joint work with Wentao Weng (Ph.D. student at MIT EECS). A preprint of the corresponding paper can be found here: https://arxiv.org/pdf/2402.12237. This work has been selected as a finalist in the 2024 INFORMS Junior Faculty Interest Group (JFIG) Paper Competition.