Abstract
This work presents DIAMOND, a deep neural network computation offloading scheme consisting of a lightweight client-to-server latency profiling component combined with a server inference time estimation module to accurately assess the expected latency of a deep learning model inference. Latency predictions for both the network and server are comprehensively used to make dynamic (partial) model offloading decisions at the client in run-time. Compared to previous work, DIAMOND targets to minimize network latency estimation overhead and considers the concurrent processing nature of state-of-the-art deep learning inference server designs. Our extensive evaluations with an NVIDIA Jetson Nano client connected to an NVIDIA Triton server shows that DIAMOND completes inference operations with noticeably reduced computational/energy overhead and latency compared to previously proposed model offloading approaches. Furthermore, our results show that DIAMOND well-adapts to practical server load and network dynamics.
Original language | English |
---|---|
Article number | 100644 |
Journal | Internet of Things (Netherlands) |
Volume | 21 |
DOIs | |
Publication status | Published - 2023 Apr |
Bibliographical note
Funding Information:This work was supported by the Ministry of Science and ICT's NRF Basic Science Research Program (2021R1A2C4002380), IITP (IITP-2022-0-00240), ITRC Program supervised by IITP (IITP-2021-2020-0-01461 and IITP-2021-2021-0-02051), by the Ministry of Culture, Sports and Tourism and Korea Creative Content Agency (R2021040018), and by the Ministry of Trade, Industry and Energy and KIAT through the International Cooperative R&D program under Grant (P0016150).
Funding Information:
This work was supported by the Ministry of Science and ICT’s NRF Basic Science Research Program ( 2021R1A2C4002380 ), IITP ( IITP-2022-0-00240 ), ITRC Program supervised by IITP ( IITP-2021-2020-0-01461 and IITP-2021-2021-0-02051 ), by the Ministry of Culture, Sports and Tourism and Korea Creative Content Agency ( R2021040018 ), and by the Ministry of Trade, Industry and Energy and KIAT through the International Cooperative R&D program under Grant ( P0016150 ).
Publisher Copyright:
© 2022 Elsevier B.V.
All Science Journal Classification (ASJC) codes
- Software
- Computer Science (miscellaneous)
- Information Systems
- Engineering (miscellaneous)
- Hardware and Architecture
- Computer Science Applications
- Artificial Intelligence
- Management of Technology and Innovation