In this paper, we propose a learning-based method to compose a video-story from a group of video clips that describe an activity or experience. We learn the coherence between video clips from real videos via the Recurrent Neural Network (RNN) that jointly incorporates the spatialoral semantics and motion dynamics to generate smooth and relevant compositions. We further rearrange the results generated by the RNN to make the overall video-story compatible with the storyline structure via a submodular ranking optimization process. Experimental results on the video-story dataset show that the proposed algorithm outperforms the state-of-the-art approach.
|Title of host publication||Proceedings - 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018|
|Publisher||Institute of Electrical and Electronics Engineers Inc.|
|Number of pages||9|
|Publication status||Published - 2018 May 3|
|Event||18th IEEE Winter Conference on Applications of Computer Vision, WACV 2018 - Lake Tahoe, United States|
Duration: 2018 Mar 12 → 2018 Mar 15
|Name||Proceedings - 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018|
|Conference||18th IEEE Winter Conference on Applications of Computer Vision, WACV 2018|
|Period||18/3/12 → 18/3/15|
Bibliographical noteFunding Information:
Acknowledgements: This work is supported in part by NS-FC (No. 61572099 and 61522203), NSF CAREER (No. 1149783), 973 Program (No. 2014CB347600), NSF of Jiangsu Province (No. BK20140058), the National Key R&D Program of China (No. 2016YFB1001001), and gifts from Adobe and Nvidia.
All Science Journal Classification (ASJC) codes
- Computer Science Applications
- Computer Vision and Pattern Recognition