Convergence-aware neural network training

Hyungjun Oh, Yongseung Yu, Giha Ryu, Gunjoo Ahn, Yuri Jeong, Yongjun Park, Jiwon Seo

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Training a deep neural network(DNN) is expensive, requiring a large amount of computation time. While the training overhead is high, not all computation in DNN training is equal. Some parameters converge faster and thus their gradient computation may contribute little to the parameter update; in nearstationary points a subset of parameters may change very little. In this paper we exploit the parameter convergence to optimize gradient computation in DNN training. We design a light-weight monitoring technique to track the parameter convergence; we prune the gradient computation stochastically for a group of semantically related parameters, exploiting their convergence correlations. These techniques are efficiently implemented in existing GPU kernels. In our evaluation the optimization techniques substantially and robustly improve the training throughput for four DNN models on three public datasets.

Original languageEnglish
Title of host publication2020 57th ACM/IEEE Design Automation Conference, DAC 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781450367257
DOIs
Publication statusPublished - 2020 Jul
Event57th ACM/IEEE Design Automation Conference, DAC 2020 - Virtual, San Francisco, United States
Duration: 2020 Jul 202020 Jul 24

Publication series

NameProceedings - Design Automation Conference
Volume2020-July
ISSN (Print)0738-100X

Conference

Conference57th ACM/IEEE Design Automation Conference, DAC 2020
Country/TerritoryUnited States
CityVirtual, San Francisco
Period20/7/2020/7/24

Bibliographical note

Funding Information:
ACKNOWLEDGEMENT This work is supported by Samsung Research, Samsung Electronics Co., LTd, by the National Research Foundation of Korea (NRF) grant (No. 2018R1D1A1B07050609), and by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No.2013-0-00109, WiseKB: Big data based self-evolving knowledge base and reasoning platform). We thank Jinwon Lee for the preliminary experiments. The corresponding authors are Jiwon Seo and Yongjun Park.

Publisher Copyright:
© 2020 IEEE.

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Control and Systems Engineering
  • Electrical and Electronic Engineering
  • Modelling and Simulation

Fingerprint

Dive into the research topics of 'Convergence-aware neural network training'. Together they form a unique fingerprint.

Cite this