[ML+Crypto Seminar]What Can Cryptography Tell Us About AI?

Speaker

Greg Gluch

Simons Institute, UC Berkeley

Host

Shafi, Yael, Jonathan and Vinod

I will present three results that use cryptographic assumptions to characterize both the limits and possibilities of AI safety. First, we show that AI alignment cannot be achieved using only black-box filters of harmful content. Second, we prove a separation between mitigation and detection at inference time, where mitigation refines an LLM’s output using additional computation to compute a safer or more accurate result. Third, we conduct a meta-analysis of watermarks, adversarial defenses, and transferable attacks, showing that for every learning task, at least one of these three schemes must exist.

Each result carries a broader message: the first argues for the necessity of weight access in AI auditing; the second provides a rule of thumb for allocating inference-time resources when safety is the goal; and the third offers an explanation for why adversarial examples often transfer across different LLMs.

Based on:

Ball, S., Gluch, G., Goldwasser, S., Kreuter, F., Reingold, O., & Rothblum, G. N. (2025). On the impossibility of separating intelligence from judgment: The computational intractability of filtering for ai alignment.
Gluch, G., & Goldwasser, S. (2025). A Cryptographic Perspective on Mitigation vs. Detection in Machine Learning.
Turan, B., Nagarajan, S. G., & Pokutta, S. (2024). The good, the bad and the ugly: watermarks, transferable attacks and adversarial defenses.

Add to Calendar 2025-10-14 10:30:00 2025-10-14 12:30:00 America/New_York [ML+Crypto Seminar]What Can Cryptography Tell Us About AI? I will present three results that use cryptographic assumptions to characterize both the limits and possibilities of AI safety. First, we show that AI alignment cannot be achieved using only black-box filters of harmful content. Second, we prove a separation between mitigation and detection at inference time, where mitigation refines an LLM’s output using additional computation to compute a safer or more accurate result. Third, we conduct a meta-analysis of watermarks, adversarial defenses, and transferable attacks, showing that for every learning task, at least one of these three schemes must exist.Each result carries a broader message: the first argues for the necessity of weight access in AI auditing; the second provides a rule of thumb for allocating inference-time resources when safety is the goal; and the third offers an explanation for why adversarial examples often transfer across different LLMs.Based on:Ball, S., Gluch, G., Goldwasser, S., Kreuter, F., Reingold, O., & Rothblum, G. N. (2025). On the impossibility of separating intelligence from judgment: The computational intractability of filtering for ai alignment.Gluch, G., & Goldwasser, S. (2025). A Cryptographic Perspective on Mitigation vs. Detection in Machine Learning.Turan, B., Nagarajan, S. G., & Pokutta, S. (2024). The good, the bad and the ugly: watermarks, transferable attacks and adversarial defenses. TBD

[ML+Crypto Seminar]What Can Cryptography Tell Us About AI?

Speaker

Host

October 14 2025

Location