Video-text compliance: Activity verification based on natural language instructions

Mayoore Jaiswal, Frank Liu, Anupama Jagannathan, Anne Gattiker, Inseok Hwang, Jinho Lee, Matthew Tong, Sahil Dureja, Soham Shah, Peter Hofstee, Valerie Chen, Suvadip Paul, Rogerio Feris

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We define a new multi-modal compliance problem, which is to determine if the human activity in a given video is in compliance with an associated text instruction. Solutions to the compliance problem could enable automatic compliance checking and efficient feedback in many real-world settings. To this end, we introduce the Video-Text Compliance (VTC) dataset, which contains videos of atomic activities, along with text instructions and compliance labels. The VTC dataset is constructed by an auto-augmentation technique, preserves privacy, and contains over 1.2 million frames. Finally, we present ComplianceNet, a novel end-to-end trainable compliance network that improves the baseline accuracy by 27.5% on average when trained on the VTC dataset. We plan to release the VTC dataset to the community for future research.

Original languageEnglish
Title of host publicationProceedings - 2019 International Conference on Computer Vision Workshop, ICCVW 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1503-1512
Number of pages10
ISBN (Electronic)9781728150239
DOIs
Publication statusPublished - 2019 Oct
Event17th IEEE/CVF International Conference on Computer Vision Workshop, ICCVW 2019 - Seoul, Korea, Republic of
Duration: 2019 Oct 272019 Oct 28

Publication series

NameProceedings - 2019 International Conference on Computer Vision Workshop, ICCVW 2019

Conference

Conference17th IEEE/CVF International Conference on Computer Vision Workshop, ICCVW 2019
Country/TerritoryKorea, Republic of
CitySeoul
Period19/10/2719/10/28

Bibliographical note

Publisher Copyright:
© 2019 IEEE.

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'Video-text compliance: Activity verification based on natural language instructions'. Together they form a unique fingerprint.

Cite this