DCS: A fast and scalable device-centric server architecture

Jaehyung Ahn, Dongup Kwon, Youngsok Kim, Mohammadamin Ajdari, Jaewon Lee, Jangwoo Kim

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Citations (Scopus)

Abstract

Conventional servers have achieved high performance by employing fast CPUs to run compute-intensive workloads, while making operating systems manage relatively slow I/O devices through memory accesses and interrupts. However, as the emerging workloads are becoming heavily data-intensive and the emerging devices (e.g., NVM storage, high-bandwidth NICs, and GPUs) come to enable low-latency and high-bandwidth device operations, the traditional host-centric server architectures fail to deliver high performance due to their inefficient device handling mechanisms. Furthermore, without resolving the architecture inefficiency, the performance loss will continue to increase as the emerging devices become faster. In this paper, we propose DCS, a novel device-centric server architecture to fully exploit the potential of the emerging devices so that the server performance nicely scales with the performance of the devices. The key idea of DCS is to orchestrate the devices to directly communicate with each other while selectively bypassing the host. The host becomes responsible for only few device-related operations (e.g., filesystem lookup). In this way, DCS achieves high I/O performance by direct inter-device communications and high computation performance by fully utilizing the host-side resources. To implement DCS, we introduce DCS Engine, a custom hardware device to orchestrate devices via standard I/O protocols (i.e., PCIe and NVMe), along with its device driver and user-level library. We show that our FPGA-based DCS prototype significantly improves the performance of emerging server workloads and the architecture will nicely scale with the performance of the devices.

Original languageEnglish
Title of host publicationProceedings - 48th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2015
PublisherIEEE Computer Society
Pages559-571
Number of pages13
ISBN (Electronic)9781450340342
DOIs
Publication statusPublished - 2015 Dec 5
Event48th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2015 - Waikiki, United States
Duration: 2015 Dec 52015 Dec 9

Publication series

NameProceedings of the Annual International Symposium on Microarchitecture, MICRO
Volume05-09-December-2015
ISSN (Print)1072-4451

Other

Other48th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2015
CountryUnited States
CityWaikiki
Period15/12/515/12/9

Fingerprint

Servers
Bandwidth
Program processors
Field programmable gate arrays (FPGA)
Engines
Hardware
Network protocols
Data storage equipment
Communication

All Science Journal Classification (ASJC) codes

  • Hardware and Architecture

Cite this

Ahn, J., Kwon, D., Kim, Y., Ajdari, M., Lee, J., & Kim, J. (2015). DCS: A fast and scalable device-centric server architecture. In Proceedings - 48th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2015 (pp. 559-571). (Proceedings of the Annual International Symposium on Microarchitecture, MICRO; Vol. 05-09-December-2015). IEEE Computer Society. https://doi.org/10.1145/2830772.2830794
Ahn, Jaehyung ; Kwon, Dongup ; Kim, Youngsok ; Ajdari, Mohammadamin ; Lee, Jaewon ; Kim, Jangwoo. / DCS : A fast and scalable device-centric server architecture. Proceedings - 48th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2015. IEEE Computer Society, 2015. pp. 559-571 (Proceedings of the Annual International Symposium on Microarchitecture, MICRO).
@inproceedings{ecbe314635f746cabd3c00a3e1eb162b,
title = "DCS: A fast and scalable device-centric server architecture",
abstract = "Conventional servers have achieved high performance by employing fast CPUs to run compute-intensive workloads, while making operating systems manage relatively slow I/O devices through memory accesses and interrupts. However, as the emerging workloads are becoming heavily data-intensive and the emerging devices (e.g., NVM storage, high-bandwidth NICs, and GPUs) come to enable low-latency and high-bandwidth device operations, the traditional host-centric server architectures fail to deliver high performance due to their inefficient device handling mechanisms. Furthermore, without resolving the architecture inefficiency, the performance loss will continue to increase as the emerging devices become faster. In this paper, we propose DCS, a novel device-centric server architecture to fully exploit the potential of the emerging devices so that the server performance nicely scales with the performance of the devices. The key idea of DCS is to orchestrate the devices to directly communicate with each other while selectively bypassing the host. The host becomes responsible for only few device-related operations (e.g., filesystem lookup). In this way, DCS achieves high I/O performance by direct inter-device communications and high computation performance by fully utilizing the host-side resources. To implement DCS, we introduce DCS Engine, a custom hardware device to orchestrate devices via standard I/O protocols (i.e., PCIe and NVMe), along with its device driver and user-level library. We show that our FPGA-based DCS prototype significantly improves the performance of emerging server workloads and the architecture will nicely scale with the performance of the devices.",
author = "Jaehyung Ahn and Dongup Kwon and Youngsok Kim and Mohammadamin Ajdari and Jaewon Lee and Jangwoo Kim",
year = "2015",
month = "12",
day = "5",
doi = "10.1145/2830772.2830794",
language = "English",
series = "Proceedings of the Annual International Symposium on Microarchitecture, MICRO",
publisher = "IEEE Computer Society",
pages = "559--571",
booktitle = "Proceedings - 48th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2015",
address = "United States",

}

Ahn, J, Kwon, D, Kim, Y, Ajdari, M, Lee, J & Kim, J 2015, DCS: A fast and scalable device-centric server architecture. in Proceedings - 48th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2015. Proceedings of the Annual International Symposium on Microarchitecture, MICRO, vol. 05-09-December-2015, IEEE Computer Society, pp. 559-571, 48th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2015, Waikiki, United States, 15/12/5. https://doi.org/10.1145/2830772.2830794

DCS : A fast and scalable device-centric server architecture. / Ahn, Jaehyung; Kwon, Dongup; Kim, Youngsok; Ajdari, Mohammadamin; Lee, Jaewon; Kim, Jangwoo.

Proceedings - 48th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2015. IEEE Computer Society, 2015. p. 559-571 (Proceedings of the Annual International Symposium on Microarchitecture, MICRO; Vol. 05-09-December-2015).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - DCS

T2 - A fast and scalable device-centric server architecture

AU - Ahn, Jaehyung

AU - Kwon, Dongup

AU - Kim, Youngsok

AU - Ajdari, Mohammadamin

AU - Lee, Jaewon

AU - Kim, Jangwoo

PY - 2015/12/5

Y1 - 2015/12/5

N2 - Conventional servers have achieved high performance by employing fast CPUs to run compute-intensive workloads, while making operating systems manage relatively slow I/O devices through memory accesses and interrupts. However, as the emerging workloads are becoming heavily data-intensive and the emerging devices (e.g., NVM storage, high-bandwidth NICs, and GPUs) come to enable low-latency and high-bandwidth device operations, the traditional host-centric server architectures fail to deliver high performance due to their inefficient device handling mechanisms. Furthermore, without resolving the architecture inefficiency, the performance loss will continue to increase as the emerging devices become faster. In this paper, we propose DCS, a novel device-centric server architecture to fully exploit the potential of the emerging devices so that the server performance nicely scales with the performance of the devices. The key idea of DCS is to orchestrate the devices to directly communicate with each other while selectively bypassing the host. The host becomes responsible for only few device-related operations (e.g., filesystem lookup). In this way, DCS achieves high I/O performance by direct inter-device communications and high computation performance by fully utilizing the host-side resources. To implement DCS, we introduce DCS Engine, a custom hardware device to orchestrate devices via standard I/O protocols (i.e., PCIe and NVMe), along with its device driver and user-level library. We show that our FPGA-based DCS prototype significantly improves the performance of emerging server workloads and the architecture will nicely scale with the performance of the devices.

AB - Conventional servers have achieved high performance by employing fast CPUs to run compute-intensive workloads, while making operating systems manage relatively slow I/O devices through memory accesses and interrupts. However, as the emerging workloads are becoming heavily data-intensive and the emerging devices (e.g., NVM storage, high-bandwidth NICs, and GPUs) come to enable low-latency and high-bandwidth device operations, the traditional host-centric server architectures fail to deliver high performance due to their inefficient device handling mechanisms. Furthermore, without resolving the architecture inefficiency, the performance loss will continue to increase as the emerging devices become faster. In this paper, we propose DCS, a novel device-centric server architecture to fully exploit the potential of the emerging devices so that the server performance nicely scales with the performance of the devices. The key idea of DCS is to orchestrate the devices to directly communicate with each other while selectively bypassing the host. The host becomes responsible for only few device-related operations (e.g., filesystem lookup). In this way, DCS achieves high I/O performance by direct inter-device communications and high computation performance by fully utilizing the host-side resources. To implement DCS, we introduce DCS Engine, a custom hardware device to orchestrate devices via standard I/O protocols (i.e., PCIe and NVMe), along with its device driver and user-level library. We show that our FPGA-based DCS prototype significantly improves the performance of emerging server workloads and the architecture will nicely scale with the performance of the devices.

UR - http://www.scopus.com/inward/record.url?scp=84959919144&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84959919144&partnerID=8YFLogxK

U2 - 10.1145/2830772.2830794

DO - 10.1145/2830772.2830794

M3 - Conference contribution

AN - SCOPUS:84959919144

T3 - Proceedings of the Annual International Symposium on Microarchitecture, MICRO

SP - 559

EP - 571

BT - Proceedings - 48th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2015

PB - IEEE Computer Society

ER -

Ahn J, Kwon D, Kim Y, Ajdari M, Lee J, Kim J. DCS: A fast and scalable device-centric server architecture. In Proceedings - 48th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2015. IEEE Computer Society. 2015. p. 559-571. (Proceedings of the Annual International Symposium on Microarchitecture, MICRO). https://doi.org/10.1145/2830772.2830794