Devising indicative evaluation metrics for the image generation task remains an open problem. The most widely used metric for measuring the similarity between real and generated images has been the Fréchet Inception Distance (FID) score. Because it does not differentiate the fidelity and diversity aspects of the generated images, recent papers have introduced variants of precision and recall metrics to diagnose those properties separately. In this paper, we show that even the latest version of the precision and recall metrics are not reliable yet; for example, they fail to detect the match between two identical distributions, they are not robust against outliers, and the evaluation hyperparameters are selected arbitrarily. We propose density and coverage metrics that solve the above issues. We analytically and experimentally show that density and coverage provide more interpretable and reliable signals for practitioners than the existing metrics. Code: github.com/clovaai/generative-evaluation-prdc .
|Title of host publication||37th International Conference on Machine Learning, ICML 2020|
|Editors||Hal Daume, Aarti Singh|
|Publisher||International Machine Learning Society (IMLS)|
|Number of pages||10|
|Publication status||Published - 2020|
|Event||37th International Conference on Machine Learning, ICML 2020 - Virtual, Online|
Duration: 2020 Jul 13 → 2020 Jul 18
|Name||37th International Conference on Machine Learning, ICML 2020|
|Conference||37th International Conference on Machine Learning, ICML 2020|
|Period||20/7/13 → 20/7/18|
Bibliographical notePublisher Copyright:
© 2020 37th International Conference on Machine Learning, ICML 2020. All rights reserved.
All Science Journal Classification (ASJC) codes
- Computational Theory and Mathematics
- Human-Computer Interaction