A deep q-learning-based optimization of the inventory control in a linear process chain

Marc-André Dittrich; Silas Fohlmeister

doi:10.1007/s11740-020-01000-8

Details

Originalsprache	Englisch
Seiten (von - bis)	35-43
Seitenumfang	9
Fachzeitschrift	Production Engineering
Jahrgang	15
Ausgabenummer	1
Frühes Online-Datum	23 Nov. 2020
Publikationsstatus	Veröffentlicht - Feb. 2021

Abstract

Due to growing globalized markets and the resulting globalization of production networks across different companies, inventory and order optimization is becoming increasingly important in the context of process chains. Thus, an adaptive and continuously self-optimizing inventory control on a global level is necessary to overcome the resulting challenges. Advances in sensor and communication technology allow companies to realize a global data exchange to achieve a holistic inventory control. Based on deep q-learning, a method for a self-optimizing inventory control is developed. Here, the decision process is based on an artificial neural network. Its input is modeled as a state vector that describes the current stocks and orders within the process chain. The output represents a control vector that controls orders for each individual station. Furthermore, a reward function, which is based on the resulting storage and late order costs, is implemented for simulations-based decision optimization. One of the main challenges of implementing deep q-learning is the hyperparameter optimization for the training process, which is investigated in this paper. The results show a significant sensitivity for the leaning rate α and the exploration rate ε. Based on optimized hyperparameters, the potential of the developed methodology could be shown by significantly reducing the total costs compared to the initial state and by achieving stable control behavior for a process chain containing up to 10 stations.

ASJC Scopus Sachgebiete

Ingenieurwesen (insg.)
Maschinenbau
Ingenieurwesen (insg.)
Wirtschaftsingenieurwesen und Fertigungstechnik

Zitieren

A deep q-learning-based optimization of the inventory control in a linear process chain. / Dittrich, Marc-André; Fohlmeister, Silas.
in: Production Engineering, Jahrgang 15, Nr. 1, 02.2021, S. 35-43.

Publikation: Beitrag in Fachzeitschrift › Artikel › Forschung › Peer-Review

Dittrich, M-A & Fohlmeister, S 2021, 'A deep q-learning-based optimization of the inventory control in a linear process chain', Production Engineering, Jg. 15, Nr. 1, S. 35-43. https://doi.org/10.1007/s11740-020-01000-8

Dittrich, M.-A., & Fohlmeister, S. (2021). A deep q-learning-based optimization of the inventory control in a linear process chain. Production Engineering, 15(1), 35-43. https://doi.org/10.1007/s11740-020-01000-8

Dittrich MA, Fohlmeister S. A deep q-learning-based optimization of the inventory control in a linear process chain. Production Engineering. 2021 Feb;15(1):35-43. Epub 2020 Nov 23. doi: 10.1007/s11740-020-01000-8

Dittrich, Marc-André ; Fohlmeister, Silas. / A deep q-learning-based optimization of the inventory control in a linear process chain. in: Production Engineering. 2021 ; Jahrgang 15, Nr. 1. S. 35-43.

Download

@article{25aefbe0e2c14829ada608a54946f123,

title = "A deep q-learning-based optimization of the inventory control in a linear process chain",

abstract = "Due to growing globalized markets and the resulting globalization of production networks across different companies, inventory and order optimization is becoming increasingly important in the context of process chains. Thus, an adaptive and continuously self-optimizing inventory control on a global level is necessary to overcome the resulting challenges. Advances in sensor and communication technology allow companies to realize a global data exchange to achieve a holistic inventory control. Based on deep q-learning, a method for a self-optimizing inventory control is developed. Here, the decision process is based on an artificial neural network. Its input is modeled as a state vector that describes the current stocks and orders within the process chain. The output represents a control vector that controls orders for each individual station. Furthermore, a reward function, which is based on the resulting storage and late order costs, is implemented for simulations-based decision optimization. One of the main challenges of implementing deep q-learning is the hyperparameter optimization for the training process, which is investigated in this paper. The results show a significant sensitivity for the leaning rate α and the exploration rate ε. Based on optimized hyperparameters, the potential of the developed methodology could be shown by significantly reducing the total costs compared to the initial state and by achieving stable control behavior for a process chain containing up to 10 stations.",

keywords = "Deep q-learning, Inventory control, Learning parameters, Process chain, Self-optimizing control",

author = "Marc-Andr{\'e} Dittrich and Silas Fohlmeister",

note = "Funding Information: Open Access funding enabled and organized by Projekt DEAL. The presented investigations were conducted within the research project DE 447/181-1. We would like to thank the German Research Foundation (DFG) for the support of this project. In addition, we would like to thank Prof. Dr.-Ing. Berend Denkena for his valuable comments and his support.",

year = "2021",

month = feb,

doi = "10.1007/s11740-020-01000-8",

language = "English",

volume = "15",

pages = "35--43",

number = "1",

}

Download

TY - JOUR

T1 - A deep q-learning-based optimization of the inventory control in a linear process chain

AU - Dittrich, Marc-André

AU - Fohlmeister, Silas

N1 - Funding Information: Open Access funding enabled and organized by Projekt DEAL. The presented investigations were conducted within the research project DE 447/181-1. We would like to thank the German Research Foundation (DFG) for the support of this project. In addition, we would like to thank Prof. Dr.-Ing. Berend Denkena for his valuable comments and his support.

PY - 2021/2

Y1 - 2021/2

N2 - Due to growing globalized markets and the resulting globalization of production networks across different companies, inventory and order optimization is becoming increasingly important in the context of process chains. Thus, an adaptive and continuously self-optimizing inventory control on a global level is necessary to overcome the resulting challenges. Advances in sensor and communication technology allow companies to realize a global data exchange to achieve a holistic inventory control. Based on deep q-learning, a method for a self-optimizing inventory control is developed. Here, the decision process is based on an artificial neural network. Its input is modeled as a state vector that describes the current stocks and orders within the process chain. The output represents a control vector that controls orders for each individual station. Furthermore, a reward function, which is based on the resulting storage and late order costs, is implemented for simulations-based decision optimization. One of the main challenges of implementing deep q-learning is the hyperparameter optimization for the training process, which is investigated in this paper. The results show a significant sensitivity for the leaning rate α and the exploration rate ε. Based on optimized hyperparameters, the potential of the developed methodology could be shown by significantly reducing the total costs compared to the initial state and by achieving stable control behavior for a process chain containing up to 10 stations.

AB - Due to growing globalized markets and the resulting globalization of production networks across different companies, inventory and order optimization is becoming increasingly important in the context of process chains. Thus, an adaptive and continuously self-optimizing inventory control on a global level is necessary to overcome the resulting challenges. Advances in sensor and communication technology allow companies to realize a global data exchange to achieve a holistic inventory control. Based on deep q-learning, a method for a self-optimizing inventory control is developed. Here, the decision process is based on an artificial neural network. Its input is modeled as a state vector that describes the current stocks and orders within the process chain. The output represents a control vector that controls orders for each individual station. Furthermore, a reward function, which is based on the resulting storage and late order costs, is implemented for simulations-based decision optimization. One of the main challenges of implementing deep q-learning is the hyperparameter optimization for the training process, which is investigated in this paper. The results show a significant sensitivity for the leaning rate α and the exploration rate ε. Based on optimized hyperparameters, the potential of the developed methodology could be shown by significantly reducing the total costs compared to the initial state and by achieving stable control behavior for a process chain containing up to 10 stations.

KW - Deep q-learning

KW - Inventory control

KW - Learning parameters

KW - Process chain

KW - Self-optimizing control

UR - http://www.scopus.com/inward/record.url?scp=85096453040&partnerID=8YFLogxK

U2 - 10.1007/s11740-020-01000-8

DO - 10.1007/s11740-020-01000-8

M3 - Article

VL - 15

SP - 35

EP - 43

JO - Production Engineering

JF - Production Engineering

SN - 0944-6524

IS - 1

ER -

Research@Leibniz University

A deep q-learning-based optimization of the inventory control in a linear process chain

Autorschaft

Organisationseinheiten

Details

Abstract

ASJC Scopus Sachgebiete

Zitieren