[ML+Crypto Seminar]What Can Cryptography Tell Us About AI?

Speaker

Greg Gluch
Simons Institute, UC Berkeley

Host

Shafi, Yael, Jonathan and Vinod

I will present three results that use cryptographic assumptions to characterize both the limits and possibilities of AI safety. First, we show that AI alignment cannot be achieved using only black-box filters of harmful content. Second, we prove a separation between mitigation and detection at inference time, where mitigation refines an LLM’s output using additional computation to compute a safer or more accurate result. Third, we conduct a meta-analysis of watermarks, adversarial defenses, and transferable attacks, showing that for every learning task, at least one of these three schemes must exist.


Each result carries a broader message: the first argues for the necessity of weight access in AI auditing; the second provides a rule of thumb for allocating inference-time resources when safety is the goal; and the third offers an explanation for why adversarial examples often transfer across different LLMs.

Based on:

  1. Ball, S., Gluch, G., Goldwasser, S., Kreuter, F., Reingold, O., & Rothblum, G. N. (2025). On the impossibility of separating intelligence from judgment: The computational intractability of filtering for ai alignment.
  2. Gluch, G., & Goldwasser, S. (2025). A Cryptographic Perspective on Mitigation vs. Detection in Machine Learning.
  3. Turan, B., Nagarajan, S. G., & Pokutta, S. (2024). The good, the bad and the ugly: watermarks, transferable attacks and adversarial defenses.