Zero-shot learning and self-supervised learning have been widely studied due to the advantage of performing representation learning in a data shortage situation efficiently. However, few studies consider zero-shot learning using semantic embeddings (e.g., CNN features or attributes) and self-supervision simultaneously. The reason is that most zero-shot learning works employ vector-level semantic embeddings. However, most self-supervision studies only consider image-level domains, so a novel self-supervision method for vector-level CNN features is needed. We propose a simple way to shuffle semantic embeddings. Furthermore, we propose a method to enrich feature representation and improve zero-shot learning performance effectively. We show that our model outperforms current state-of-the-art methods on the large-scale ImageNet 21K and the small-scale CUB and SUN datasets.
Bibliographical noteFunding Information:
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government(MSIT) (No. 2019R1A2C2003760) and in part by Institute for Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2020-0-01361, Artificial Intelligence Graduate School Program (YONSEI UNIVERSITY)).
© 2021 Elsevier B.V.
All Science Journal Classification (ASJC) codes
- Computer Science Applications
- Cognitive Neuroscience
- Artificial Intelligence