Abstract
We define a new multi-modal compliance problem, which is to determine if the human activity in a given video is in compliance with an associated text instruction. Solutions to the compliance problem could enable automatic compliance checking and efficient feedback in many real-world settings. To this end, we introduce the Video-Text Compliance (VTC) dataset, which contains videos of atomic activities, along with text instructions and compliance labels. The VTC dataset is constructed by an auto-augmentation technique, preserves privacy, and contains over 1.2 million frames. Finally, we present ComplianceNet, a novel end-to-end trainable compliance network that improves the baseline accuracy by 27.5% on average when trained on the VTC dataset. We plan to release the VTC dataset to the community for future research.
Original language | English |
---|---|
Title of host publication | Proceedings - 2019 International Conference on Computer Vision Workshop, ICCVW 2019 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 1503-1512 |
Number of pages | 10 |
ISBN (Electronic) | 9781728150239 |
DOIs | |
Publication status | Published - 2019 Oct |
Event | 17th IEEE/CVF International Conference on Computer Vision Workshop, ICCVW 2019 - Seoul, Korea, Republic of Duration: 2019 Oct 27 → 2019 Oct 28 |
Publication series
Name | Proceedings - 2019 International Conference on Computer Vision Workshop, ICCVW 2019 |
---|
Conference
Conference | 17th IEEE/CVF International Conference on Computer Vision Workshop, ICCVW 2019 |
---|---|
Country/Territory | Korea, Republic of |
City | Seoul |
Period | 19/10/27 → 19/10/28 |
Bibliographical note
Publisher Copyright:© 2019 IEEE.
All Science Journal Classification (ASJC) codes
- Computer Science Applications
- Computer Vision and Pattern Recognition