The key to solving the fine-grained image recognition is exploring more discriminative features for capturing tiny hints. In particular, the triplet objective function fits well with the fine-grained image recognition task because they capture the semantic similarity between images. However, triplet loss needs many pairs of tuples with hard negative samples, and it takes too much cost. To alleviate this problem, we propose a new framework that generates features of the hard negative samples. The proposed framework consists of three stages: learning part-wise features, enriching refined hard negative samples, and fine-grained image recognition. Our proposed method has achieved state-of-the-art performance in CUB-200-2011, Stanford Cars, FGVC-Aircraft, and DeepFashion datasets. Also, our extensive experiments demonstrate that each stage has a good effect on the final goal.
Bibliographical noteFunding Information:
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2019R1A2C2003760) and Institute for Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2020-0-01361, Artificial Intelligence Graduate School Program (YONSEI UNIVERSITY)).
All Science Journal Classification (ASJC) codes
- Computer Science Applications
- Cognitive Neuroscience
- Artificial Intelligence