How to Verify Any (Reasonable) Distribution Property: Computationally Sound Argument Systems for Distributions

Speaker
Host
As statistical analyses become increasingly central, there is a growing need to ensure their results are correct. Approximate correctness can be verified by replicating the entire analysis, but can we verify without replication? We focus on distribution testing problems: given samples from an unknown distribution, the goal is verifying that the distribution is close to having a claimed property. Our main contribution is an interactive protocol between a verifier and an untrusted prover who both have sampling access to the unknown distribution. Our protocol can be used to verify a very rich class of properties: the only requirement is that, given a full and explicit description of a distribution, it should be possible to approximate its distance from the property in polynomial time. For any such property, if the distribution is at statistical distance $\varepsilon$ from having the property, then the verifier rejects with high probability. This soundness property holds against any polynomial-time strategy that a cheating prover might follow, assuming the existence of collision-resistant hash.
For distributions over a domain of size $N$, the protocol consists of $4$ messages and the communication complexity and verifier runtime are roughly $\widetilde{O}\left(\sqrt{N} / \varepsilon^2 \right)$. The verifier's sample complexity is $\widetilde{O}\left(\sqrt{N} / \varepsilon^2 \right)$, and this is optimal up to $\polylog(N)$ factors (for any protocol, regardless of its communication complexity).
Even for simple properties, approximately deciding whether an unknown distribution has the property can require quasi-linear sample complexity and running time. For any such property, our protocol provides a quadratic speedup over replicating the analysis.