Server load and network-aware adaptive deep learning inference offloading for edge platforms

Jungmo Ahn, Youngki Lee, Jeongseob Ahn, Jeong Gil Ko

Research output: Contribution to journalArticlepeer-review

Abstract

This work presents DIAMOND, a deep neural network computation offloading scheme consisting of a lightweight client-to-server latency profiling component combined with a server inference time estimation module to accurately assess the expected latency of a deep learning model inference. Latency predictions for both the network and server are comprehensively used to make dynamic (partial) model offloading decisions at the client in run-time. Compared to previous work, DIAMOND targets to minimize network latency estimation overhead and considers the concurrent processing nature of state-of-the-art deep learning inference server designs. Our extensive evaluations with an NVIDIA Jetson Nano client connected to an NVIDIA Triton server shows that DIAMOND completes inference operations with noticeably reduced computational/energy overhead and latency compared to previously proposed model offloading approaches. Furthermore, our results show that DIAMOND well-adapts to practical server load and network dynamics.

Original languageEnglish
Article number100644
JournalInternet of Things (Netherlands)
Volume21
DOIs
Publication statusPublished - 2023 Apr

Bibliographical note

Funding Information:
This work was supported by the Ministry of Science and ICT's NRF Basic Science Research Program (2021R1A2C4002380), IITP (IITP-2022-0-00240), ITRC Program supervised by IITP (IITP-2021-2020-0-01461 and IITP-2021-2021-0-02051), by the Ministry of Culture, Sports and Tourism and Korea Creative Content Agency (R2021040018), and by the Ministry of Trade, Industry and Energy and KIAT through the International Cooperative R&D program under Grant (P0016150).

Funding Information:
This work was supported by the Ministry of Science and ICT’s NRF Basic Science Research Program ( 2021R1A2C4002380 ), IITP ( IITP-2022-0-00240 ), ITRC Program supervised by IITP ( IITP-2021-2020-0-01461 and IITP-2021-2021-0-02051 ), by the Ministry of Culture, Sports and Tourism and Korea Creative Content Agency ( R2021040018 ), and by the Ministry of Trade, Industry and Energy and KIAT through the International Cooperative R&D program under Grant ( P0016150 ).

Publisher Copyright:
© 2022 Elsevier B.V.

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Science (miscellaneous)
  • Information Systems
  • Engineering (miscellaneous)
  • Hardware and Architecture
  • Computer Science Applications
  • Artificial Intelligence
  • Management of Technology and Innovation

Fingerprint

Dive into the research topics of 'Server load and network-aware adaptive deep learning inference offloading for edge platforms'. Together they form a unique fingerprint.

Cite this