Deep Video Quality Assessor: From Spatio-Temporal Visual Sensitivity to a Convolutional Neural Aggregation Network

Woojae Kim, Jongyoo Kim, Sewoong Ahn, Jinwoo Kim, Sanghoon Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Incorporating spatio-temporal human visual perception into video quality assessment (VQA) remains a formidable issue. Previous statistical or computational models of spatio-temporal perception have limitations to be applied to the general VQA algorithms. In this paper, we propose a novel full-reference (FR) VQA framework named Deep Video Quality Assessor (DeepVQA) to quantify the spatio-temporal visual perception via a convolutional neural network (CNN) and a convolutional neural aggregation network (CNAN). Our framework enables to figure out the spatio-temporal sensitivity behavior through learning in accordance with the subjective score. In addition, to manipulate the temporal variation of distortions, we propose a novel temporal pooling method using an attention model. In the experiment, we show DeepVQA remarkably achieves the state-of-the-art prediction accuracy of more than 0.9 correlation, which is \sim 5% higher than those of conventional methods on the LIVE and CSIQ video databases.

Original languageEnglish
Title of host publicationComputer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings
EditorsMartial Hebert, Vittorio Ferrari, Cristian Sminchisescu, Yair Weiss
PublisherSpringer Verlag
Pages224-241
Number of pages18
ISBN (Print)9783030012458
DOIs
Publication statusPublished - 2018 Jan 1
Event15th European Conference on Computer Vision, ECCV 2018 - Munich, Germany
Duration: 2018 Sep 82018 Sep 14

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11205 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other15th European Conference on Computer Vision, ECCV 2018
CountryGermany
CityMunich
Period18/9/818/9/14

Fingerprint

Video Quality
Aggregation
Agglomeration
Quality Assessment
Visual Perception
Neural networks
Video Databases
Human Perception
Pooling
Experiments
Computational Model
Statistical Model
Figure
Quantify
Vision
Neural Networks
Prediction
Experiment

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Kim, W., Kim, J., Ahn, S., Kim, J., & Lee, S. (2018). Deep Video Quality Assessor: From Spatio-Temporal Visual Sensitivity to a Convolutional Neural Aggregation Network. In M. Hebert, V. Ferrari, C. Sminchisescu, & Y. Weiss (Eds.), Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings (pp. 224-241). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11205 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-030-01246-5_14
Kim, Woojae ; Kim, Jongyoo ; Ahn, Sewoong ; Kim, Jinwoo ; Lee, Sanghoon. / Deep Video Quality Assessor : From Spatio-Temporal Visual Sensitivity to a Convolutional Neural Aggregation Network. Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings. editor / Martial Hebert ; Vittorio Ferrari ; Cristian Sminchisescu ; Yair Weiss. Springer Verlag, 2018. pp. 224-241 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{3b22cf683fdd474db1606f54fae0c9f7,
title = "Deep Video Quality Assessor: From Spatio-Temporal Visual Sensitivity to a Convolutional Neural Aggregation Network",
abstract = "Incorporating spatio-temporal human visual perception into video quality assessment (VQA) remains a formidable issue. Previous statistical or computational models of spatio-temporal perception have limitations to be applied to the general VQA algorithms. In this paper, we propose a novel full-reference (FR) VQA framework named Deep Video Quality Assessor (DeepVQA) to quantify the spatio-temporal visual perception via a convolutional neural network (CNN) and a convolutional neural aggregation network (CNAN). Our framework enables to figure out the spatio-temporal sensitivity behavior through learning in accordance with the subjective score. In addition, to manipulate the temporal variation of distortions, we propose a novel temporal pooling method using an attention model. In the experiment, we show DeepVQA remarkably achieves the state-of-the-art prediction accuracy of more than 0.9 correlation, which is \sim 5{\%} higher than those of conventional methods on the LIVE and CSIQ video databases.",
author = "Woojae Kim and Jongyoo Kim and Sewoong Ahn and Jinwoo Kim and Sanghoon Lee",
year = "2018",
month = "1",
day = "1",
doi = "10.1007/978-3-030-01246-5_14",
language = "English",
isbn = "9783030012458",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "224--241",
editor = "Martial Hebert and Vittorio Ferrari and Cristian Sminchisescu and Yair Weiss",
booktitle = "Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings",
address = "Germany",

}

Kim, W, Kim, J, Ahn, S, Kim, J & Lee, S 2018, Deep Video Quality Assessor: From Spatio-Temporal Visual Sensitivity to a Convolutional Neural Aggregation Network. in M Hebert, V Ferrari, C Sminchisescu & Y Weiss (eds), Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11205 LNCS, Springer Verlag, pp. 224-241, 15th European Conference on Computer Vision, ECCV 2018, Munich, Germany, 18/9/8. https://doi.org/10.1007/978-3-030-01246-5_14

Deep Video Quality Assessor : From Spatio-Temporal Visual Sensitivity to a Convolutional Neural Aggregation Network. / Kim, Woojae; Kim, Jongyoo; Ahn, Sewoong; Kim, Jinwoo; Lee, Sanghoon.

Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings. ed. / Martial Hebert; Vittorio Ferrari; Cristian Sminchisescu; Yair Weiss. Springer Verlag, 2018. p. 224-241 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11205 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Deep Video Quality Assessor

T2 - From Spatio-Temporal Visual Sensitivity to a Convolutional Neural Aggregation Network

AU - Kim, Woojae

AU - Kim, Jongyoo

AU - Ahn, Sewoong

AU - Kim, Jinwoo

AU - Lee, Sanghoon

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Incorporating spatio-temporal human visual perception into video quality assessment (VQA) remains a formidable issue. Previous statistical or computational models of spatio-temporal perception have limitations to be applied to the general VQA algorithms. In this paper, we propose a novel full-reference (FR) VQA framework named Deep Video Quality Assessor (DeepVQA) to quantify the spatio-temporal visual perception via a convolutional neural network (CNN) and a convolutional neural aggregation network (CNAN). Our framework enables to figure out the spatio-temporal sensitivity behavior through learning in accordance with the subjective score. In addition, to manipulate the temporal variation of distortions, we propose a novel temporal pooling method using an attention model. In the experiment, we show DeepVQA remarkably achieves the state-of-the-art prediction accuracy of more than 0.9 correlation, which is \sim 5% higher than those of conventional methods on the LIVE and CSIQ video databases.

AB - Incorporating spatio-temporal human visual perception into video quality assessment (VQA) remains a formidable issue. Previous statistical or computational models of spatio-temporal perception have limitations to be applied to the general VQA algorithms. In this paper, we propose a novel full-reference (FR) VQA framework named Deep Video Quality Assessor (DeepVQA) to quantify the spatio-temporal visual perception via a convolutional neural network (CNN) and a convolutional neural aggregation network (CNAN). Our framework enables to figure out the spatio-temporal sensitivity behavior through learning in accordance with the subjective score. In addition, to manipulate the temporal variation of distortions, we propose a novel temporal pooling method using an attention model. In the experiment, we show DeepVQA remarkably achieves the state-of-the-art prediction accuracy of more than 0.9 correlation, which is \sim 5% higher than those of conventional methods on the LIVE and CSIQ video databases.

UR - http://www.scopus.com/inward/record.url?scp=85055115009&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85055115009&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-01246-5_14

DO - 10.1007/978-3-030-01246-5_14

M3 - Conference contribution

AN - SCOPUS:85055115009

SN - 9783030012458

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 224

EP - 241

BT - Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings

A2 - Hebert, Martial

A2 - Ferrari, Vittorio

A2 - Sminchisescu, Cristian

A2 - Weiss, Yair

PB - Springer Verlag

ER -

Kim W, Kim J, Ahn S, Kim J, Lee S. Deep Video Quality Assessor: From Spatio-Temporal Visual Sensitivity to a Convolutional Neural Aggregation Network. In Hebert M, Ferrari V, Sminchisescu C, Weiss Y, editors, Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings. Springer Verlag. 2018. p. 224-241. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-030-01246-5_14