Disk-based matrix completion for memory limited devices

Dongha Lee, Jinoh Oh, Christos Faloutsos, Byungju Kim, Hwanjo Yu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

More and more data need to be processed or analyzed within mobile devices for efficiency or privacy reasons, but performing machine learning tasks with large data within the devices is challenging because of their limited memory resources. For this reason, disk-based machine learning methods have been actively researched, which utilize storage resources without holding all the data in memory. This paper proposes D-MC2, a novel disk-based matrix completion method that (1) supports incremental data update (i.e., data insertion and deletion) and (2) spills both data and model to disk when necessary; these functionalities are not supported by existing methods. First, D-MC2 builds a two-layered index to efficiently support incremental data update; there exists a tradeoff relationship between model learning and data update costs, and our two-layered index simultaneously optimizes the two costs. Second, we develop a window-based stochastic gradient descent (SGD) scheduler to efficiently support the dual spilling; a huge amount of disk I/O is incurred when the size of model is larger than that of memory, and our new scheduler substantially reduces it. Our evaluation results show that D-MC2 is significantly more scalable and faster than other disk-based competitors under the limited memory environment. In terms of the co-optimization, DMC2 outperforms the baselines that only optimize one of the two costs up to 48x. Furthermore, the window-based scheduler improves the training speed 12.4x faster compared to a naive scheduler.

Original languageEnglish
Title of host publicationCIKM 2018 - Proceedings of the 27th ACM International Conference on Information and Knowledge Management
EditorsNorman Paton, Selcuk Candan, Haixun Wang, James Allan, Rakesh Agrawal, Alexandros Labrinidis, Alfredo Cuzzocrea, Mohammed Zaki, Divesh Srivastava, Andrei Broder, Assaf Schuster
PublisherAssociation for Computing Machinery
Pages1093-1102
Number of pages10
ISBN (Electronic)9781450360142
DOIs
Publication statusPublished - 2018 Oct 17
Event27th ACM International Conference on Information and Knowledge Management, CIKM 2018 - Torino, Italy
Duration: 2018 Oct 222018 Oct 26

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings

Other

Other27th ACM International Conference on Information and Knowledge Management, CIKM 2018
Country/TerritoryItaly
CityTorino
Period18/10/2218/10/26

Bibliographical note

Funding Information:
This research was supported by 1) the NRF grant funded by the MSIT (No. 2016R1E1A1A01942642), 2) the IITP grant funded by the MSIT (No. 2018-0-00584), 3) “Basic Science Research Program” through the NRF funded by the MSIT (No. 2017M3C4A7063570), and 4) the MSIT under the “ICT Consilience Creative Program” (IITP-2018-2011-1-00783).

Funding Information:
This research was supported by 1) the NRF grant funded by the MSIT (No. 2016R1E1A1A01942642), 2) the IITP grant funded by the MSIT (No. 2018-0-00584), 3) Basic Science Research Program through the NRF funded by the MSIT (No. 2017M3C4A7063570), and 4) the MSIT under the ICT Consilience Creative Program (IITP-2018-2011-1-00783).

Publisher Copyright:
© 2018 Association for Computing Machinery.

All Science Journal Classification (ASJC) codes

  • Decision Sciences(all)
  • Business, Management and Accounting(all)

Fingerprint

Dive into the research topics of 'Disk-based matrix completion for memory limited devices'. Together they form a unique fingerprint.

Cite this