Abstract
More and more data need to be processed or analyzed within mobile devices for efficiency or privacy reasons, but performing machine learning tasks with large data within the devices is challenging because of their limited memory resources. For this reason, disk-based machine learning methods have been actively researched, which utilize storage resources without holding all the data in memory. This paper proposes D-MC2, a novel disk-based matrix completion method that (1) supports incremental data update (i.e., data insertion and deletion) and (2) spills both data and model to disk when necessary; these functionalities are not supported by existing methods. First, D-MC2 builds a two-layered index to efficiently support incremental data update; there exists a tradeoff relationship between model learning and data update costs, and our two-layered index simultaneously optimizes the two costs. Second, we develop a window-based stochastic gradient descent (SGD) scheduler to efficiently support the dual spilling; a huge amount of disk I/O is incurred when the size of model is larger than that of memory, and our new scheduler substantially reduces it. Our evaluation results show that D-MC2 is significantly more scalable and faster than other disk-based competitors under the limited memory environment. In terms of the co-optimization, DMC2 outperforms the baselines that only optimize one of the two costs up to 48x. Furthermore, the window-based scheduler improves the training speed 12.4x faster compared to a naive scheduler.
Original language | English |
---|---|
Title of host publication | CIKM 2018 - Proceedings of the 27th ACM International Conference on Information and Knowledge Management |
Editors | Norman Paton, Selcuk Candan, Haixun Wang, James Allan, Rakesh Agrawal, Alexandros Labrinidis, Alfredo Cuzzocrea, Mohammed Zaki, Divesh Srivastava, Andrei Broder, Assaf Schuster |
Publisher | Association for Computing Machinery |
Pages | 1093-1102 |
Number of pages | 10 |
ISBN (Electronic) | 9781450360142 |
DOIs | |
Publication status | Published - 2018 Oct 17 |
Event | 27th ACM International Conference on Information and Knowledge Management, CIKM 2018 - Torino, Italy Duration: 2018 Oct 22 → 2018 Oct 26 |
Publication series
Name | International Conference on Information and Knowledge Management, Proceedings |
---|
Other
Other | 27th ACM International Conference on Information and Knowledge Management, CIKM 2018 |
---|---|
Country/Territory | Italy |
City | Torino |
Period | 18/10/22 → 18/10/26 |
Bibliographical note
Funding Information:This research was supported by 1) the NRF grant funded by the MSIT (No. 2016R1E1A1A01942642), 2) the IITP grant funded by the MSIT (No. 2018-0-00584), 3) “Basic Science Research Program” through the NRF funded by the MSIT (No. 2017M3C4A7063570), and 4) the MSIT under the “ICT Consilience Creative Program” (IITP-2018-2011-1-00783).
Funding Information:
This research was supported by 1) the NRF grant funded by the MSIT (No. 2016R1E1A1A01942642), 2) the IITP grant funded by the MSIT (No. 2018-0-00584), 3) Basic Science Research Program through the NRF funded by the MSIT (No. 2017M3C4A7063570), and 4) the MSIT under the ICT Consilience Creative Program (IITP-2018-2011-1-00783).
Publisher Copyright:
© 2018 Association for Computing Machinery.
All Science Journal Classification (ASJC) codes
- Decision Sciences(all)
- Business, Management and Accounting(all)