Details
Original language | English |
---|---|
Article number | 3377454 |
Journal | ACM Computing Surveys (CSUR) |
Volume | 53 |
Issue number | 2 |
Publication status | Published - 20 Mar 2020 |
Externally published | Yes |
Abstract
Keywords
- Distributed machine learning, distributed systems
ASJC Scopus subject areas
- Mathematics(all)
- Theoretical Computer Science
- Computer Science(all)
- General Computer Science
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
In: ACM Computing Surveys (CSUR), Vol. 53, No. 2, 3377454, 20.03.2020.
Research output: Contribution to journal › Review article › Research › peer review
}
TY - JOUR
T1 - A Survey on Distributed Machine Learning
AU - Verbraeken, Joost
AU - Wolting, Matthijs
AU - Katzy, Jonathan
AU - Kloppenburg, Jeroen
AU - Verbelen, Tim
AU - Rellermeyer, Jan
N1 - Publisher Copyright: © 2020 ACM.
PY - 2020/3/20
Y1 - 2020/3/20
N2 - The demand for artificial intelligence has grown significantly over the past decade, and this growth has been fueled by advances in machine learning techniques and the ability to leverage hardware acceleration. However, to increase the quality of predictions and render machine learning solutions feasible for more complex applications, a substantial amount of training data is required. Although small machine learning models can be trained with modest amounts of data, the input for training larger models such as neural networks grows exponentially with the number of parameters. Since the demand for processing training data has outpaced the increase in computation power of computing machinery, there is a need for distributing the machine learning workload across multiple machines, and turning the centralized into a distributed system. These distributed systems present new challenges: first and foremost, the efficient parallelization of the training process and the creation of a coherent model. This article provides an extensive overview of the current state-of-the-art in the field by outlining the challenges and opportunities of distributed machine learning over conventional (centralized) machine learning, discussing the techniques used for distributed machine learning, and providing an overview of the systems that are available.
AB - The demand for artificial intelligence has grown significantly over the past decade, and this growth has been fueled by advances in machine learning techniques and the ability to leverage hardware acceleration. However, to increase the quality of predictions and render machine learning solutions feasible for more complex applications, a substantial amount of training data is required. Although small machine learning models can be trained with modest amounts of data, the input for training larger models such as neural networks grows exponentially with the number of parameters. Since the demand for processing training data has outpaced the increase in computation power of computing machinery, there is a need for distributing the machine learning workload across multiple machines, and turning the centralized into a distributed system. These distributed systems present new challenges: first and foremost, the efficient parallelization of the training process and the creation of a coherent model. This article provides an extensive overview of the current state-of-the-art in the field by outlining the challenges and opportunities of distributed machine learning over conventional (centralized) machine learning, discussing the techniques used for distributed machine learning, and providing an overview of the systems that are available.
KW - Distributed machine learning
KW - distributed systems
UR - http://www.scopus.com/inward/record.url?scp=85087906333&partnerID=8YFLogxK
U2 - 10.1145/3377454
DO - 10.1145/3377454
M3 - Review article
VL - 53
JO - ACM Computing Surveys (CSUR)
JF - ACM Computing Surveys (CSUR)
SN - 1557-7341
IS - 2
M1 - 3377454
ER -