Abstract
Object proposal algorithms have been demonstrated to be very successful in accelerating object detection process. High object localization quality and detection recall can be obtained using thousands of proposals. However, the performance with a small number of proposals is still unsatisfactory. This paper demonstrates that the performance of a few proposals can be significantly improved with the minimal human interaction-a single touch point. To this end, we first generate hierarchical superpixels using an efficient tree-organized structure as our initial object proposals, and then select only a few proposals from them by learning an effective Convolutional neural network for objectness ranking. We explore and design an architecture to integrate human interaction with the global information of the whole image for objectness scoring, which is able to significantly improve the performance with a minimum number of object proposals. Extensive experiments show the proposed method outperforms all the state-of-the-art methods for locating the meaningful object with the touch point constraint. Furthermore, the proposed method is extended for video. By combining with the novel interactive motion segmentation cue for generating hierarchical superpixels, the performance on a single proposal is satisfactory and can be used in the interactive vision systems, such as selecting the input of a real-time tracking system.
Original language | English |
---|---|
Article number | 8115165 |
Pages (from-to) | 2552-2566 |
Number of pages | 15 |
Journal | IEEE Transactions on Circuits and Systems for Video Technology |
Volume | 29 |
Issue number | 9 |
DOIs | |
Publication status | Published - 2019 Sept |
Bibliographical note
Funding Information:Manuscript received January 24, 2017; revised September 11, 2017 and October 25, 2017; accepted November 5, 2017. Date of publication November 20, 2017; date of current version September 4, 2019. This work was supported in part by the National Natural Science Foundation of China under Grant 61702194 and Grant 61772206, in part by the Guangdong Natural Science Foundation under Grant 2017A030311027, in part by the strategic Research Grant of City University of Hong Kong under Grant 7004420, and in part by the General Research Fund from the Hong Kong Research Grants Council under Grant CityU 11211417. This paper was recommended by Associate Editor Q. Wang. (Corresponding author: Shengfeng He.) M. Chen, J. Zhang, and Q. Li are with the Department of Computer Science, City University of Hong Kong, Hong Kong (e-mail: christmas. chen.3.4@gmail.com; jiawzhang8-c@my.cityu.edu.hk; qing.li@cityu.edu.hk).
Publisher Copyright:
© 1991-2012 IEEE.
All Science Journal Classification (ASJC) codes
- Media Technology
- Electrical and Electronic Engineering