Object proposal algorithms have been demonstrated to be very successful in accelerating object detection process. High object localization quality and detection recall can be obtained using thousands of proposals. However, the performance with a small number of proposals is still unsatisfactory. This paper demonstrates that the performance of a few proposals can be significantly improved with the minimal human interaction-a single touch point. To this end, we first generate hierarchical superpixels using an efficient tree-organized structure as our initial object proposals, and then select only a few proposals from them by learning an effective Convolutional neural network for objectness ranking. We explore and design an architecture to integrate human interaction with the global information of the whole image for objectness scoring, which is able to significantly improve the performance with a minimum number of object proposals. Extensive experiments show the proposed method outperforms all the state-of-the-art methods for locating the meaningful object with the touch point constraint. Furthermore, the proposed method is extended for video. By combining with the novel interactive motion segmentation cue for generating hierarchical superpixels, the performance on a single proposal is satisfactory and can be used in the interactive vision systems, such as selecting the input of a real-time tracking system.
|Number of pages||15|
|Journal||IEEE Transactions on Circuits and Systems for Video Technology|
|Publication status||Published - 2019 Sep|
All Science Journal Classification (ASJC) codes
- Media Technology
- Electrical and Electronic Engineering