Do ImageNet Classifiers Generalize to ImageNet?
Speaker
Ludwig Schmidt
Computer Science at UC Berkeley
Host
Aleksander Madry
Abstract: We build new test sets for the CIFAR-10 and ImageNet
datasets. Both benchmarks have been the focus of intense research for
almost a decade, raising the danger of overfitting to excessively
re-used test sets. By closely following the original dataset creation
processes, we test to what extent current classification models
generalize to new data. We evaluate a broad range of models and find
accuracy drops of 3% - 15% on CIFAR-10 and 11% - 14% on ImageNet.
However, accuracy gains on the original test sets translate to larger
gains on the new test sets. Our results suggest that the accuracy
drops are not caused by adaptivity, but by the models' inability to
generalize to slightly "harder" images than those found in the
original test sets.
Joint work with Ben Recht, Rebecca Roelofs, and Vaishaal Shankar.
https://arxiv.org/abs/1902.10811
datasets. Both benchmarks have been the focus of intense research for
almost a decade, raising the danger of overfitting to excessively
re-used test sets. By closely following the original dataset creation
processes, we test to what extent current classification models
generalize to new data. We evaluate a broad range of models and find
accuracy drops of 3% - 15% on CIFAR-10 and 11% - 14% on ImageNet.
However, accuracy gains on the original test sets translate to larger
gains on the new test sets. Our results suggest that the accuracy
drops are not caused by adaptivity, but by the models' inability to
generalize to slightly "harder" images than those found in the
original test sets.
Joint work with Ben Recht, Rebecca Roelofs, and Vaishaal Shankar.
https://arxiv.org/abs/1902.10811