Machine learning for early stage building energy prediction: Increment and enrichment

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Autoren

  • Manav Mahan Singh
  • Sundaravelpandian Singaravel
  • Philipp Florian Geyer

Externe Organisationen

  • KU Leuven
  • Technische Universität Berlin
Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Aufsatznummer117787
FachzeitschriftApplied energy
Jahrgang304
Frühes Online-Datum10 Sept. 2021
PublikationsstatusVeröffentlicht - 15 Dez. 2021
Extern publiziertJa

Abstract

Collecting data for machine learning (ML) development is a resource-intensive task that necessitates identifying an efficient data collection approach. This study focuses on ML models that provide quick energy results by dramatically reducing computational demand. The generalisation of such models for multiple building shapes is vital to early-stage energy prediction. Therefore, this article examines which approach of collecting new training samples improves generalisation more - increment of samples in a similar data range or enrichment with samples exhibiting novelty in shape. The first training dataset collects samples from a box-shaped building energy model (BEM). Distribution analysis suggests that they fill only a small portion of the design space. Using the same BEM, the increment approach collects samples that fill the same portion. In contrast, using three differently shaped BEMs, the enrichment approach collects samples well-distributed in the design space. The distribution of samples in a training dataset is quantified to assess their potential to improve generalisation. Using the same number of training samples, the enrichment approach fills the design space better than the increment, reducing the generalisation error (root-mean-square-error) by 58%, compared to 38% after the increment. Hence, the article suggests analysing the distribution of existing and prospective samples to identify an efficient data collection approach having a higher potential to improve generalisation. The developed method will be useful to save expensive data collection resources by focussing on a limited number of samples.

ASJC Scopus Sachgebiete

Zitieren

Machine learning for early stage building energy prediction: Increment and enrichment. / Singh, Manav Mahan; Singaravel, Sundaravelpandian; Geyer, Philipp Florian.
in: Applied energy, Jahrgang 304, 117787, 15.12.2021.

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Singh MM, Singaravel S, Geyer PF. Machine learning for early stage building energy prediction: Increment and enrichment. Applied energy. 2021 Dez 15;304:117787. Epub 2021 Sep 10. doi: 10.1016/j.apenergy.2021.117787
Singh, Manav Mahan ; Singaravel, Sundaravelpandian ; Geyer, Philipp Florian. / Machine learning for early stage building energy prediction : Increment and enrichment. in: Applied energy. 2021 ; Jahrgang 304.
Download
@article{d4d8c291105a48a8ad98375a2e01795b,
title = "Machine learning for early stage building energy prediction: Increment and enrichment",
abstract = "Collecting data for machine learning (ML) development is a resource-intensive task that necessitates identifying an efficient data collection approach. This study focuses on ML models that provide quick energy results by dramatically reducing computational demand. The generalisation of such models for multiple building shapes is vital to early-stage energy prediction. Therefore, this article examines which approach of collecting new training samples improves generalisation more - increment of samples in a similar data range or enrichment with samples exhibiting novelty in shape. The first training dataset collects samples from a box-shaped building energy model (BEM). Distribution analysis suggests that they fill only a small portion of the design space. Using the same BEM, the increment approach collects samples that fill the same portion. In contrast, using three differently shaped BEMs, the enrichment approach collects samples well-distributed in the design space. The distribution of samples in a training dataset is quantified to assess their potential to improve generalisation. Using the same number of training samples, the enrichment approach fills the design space better than the increment, reducing the generalisation error (root-mean-square-error) by 58%, compared to 38% after the increment. Hence, the article suggests analysing the distribution of existing and prospective samples to identify an efficient data collection approach having a higher potential to improve generalisation. The developed method will be useful to save expensive data collection resources by focussing on a limited number of samples.",
keywords = "Data analysis, Data collection, Energy performance, Generalisation error, Training data",
author = "Singh, {Manav Mahan} and Sundaravelpandian Singaravel and Geyer, {Philipp Florian}",
note = "Funding Information: The authors want to acknowledge the support of Deutsche Forschungsgemeinschaft (DFG), Germany, for funding the research through the grant GE1652/3-1 within the research unit FOR 2363. The computational resources and services used in this work were provided by the VSC (Flemish Supercomputer Center), funded by the Research Foundation - Flanders (FWO) and the Flemish Government – department EWI, Belgium. We would like to express our sincere gratitude to the Institute of Energy Efficient and Sustainable Design and Building, Technical University, Munich (TUM) and Ferdinand Tausendpfund GmbH & Co. KG for providing energy consumption and design data of Tausendpfund building. The datasets related to this article can be found at http://dx.doi.org/10.17632/9jvh8ckjbw.3, an open-source online data repository hosted at Mendeley Data [59].",
year = "2021",
month = dec,
day = "15",
doi = "10.1016/j.apenergy.2021.117787",
language = "English",
volume = "304",
journal = "Applied energy",
issn = "0306-2619",
publisher = "Elsevier BV",

}

Download

TY - JOUR

T1 - Machine learning for early stage building energy prediction

T2 - Increment and enrichment

AU - Singh, Manav Mahan

AU - Singaravel, Sundaravelpandian

AU - Geyer, Philipp Florian

N1 - Funding Information: The authors want to acknowledge the support of Deutsche Forschungsgemeinschaft (DFG), Germany, for funding the research through the grant GE1652/3-1 within the research unit FOR 2363. The computational resources and services used in this work were provided by the VSC (Flemish Supercomputer Center), funded by the Research Foundation - Flanders (FWO) and the Flemish Government – department EWI, Belgium. We would like to express our sincere gratitude to the Institute of Energy Efficient and Sustainable Design and Building, Technical University, Munich (TUM) and Ferdinand Tausendpfund GmbH & Co. KG for providing energy consumption and design data of Tausendpfund building. The datasets related to this article can be found at http://dx.doi.org/10.17632/9jvh8ckjbw.3, an open-source online data repository hosted at Mendeley Data [59].

PY - 2021/12/15

Y1 - 2021/12/15

N2 - Collecting data for machine learning (ML) development is a resource-intensive task that necessitates identifying an efficient data collection approach. This study focuses on ML models that provide quick energy results by dramatically reducing computational demand. The generalisation of such models for multiple building shapes is vital to early-stage energy prediction. Therefore, this article examines which approach of collecting new training samples improves generalisation more - increment of samples in a similar data range or enrichment with samples exhibiting novelty in shape. The first training dataset collects samples from a box-shaped building energy model (BEM). Distribution analysis suggests that they fill only a small portion of the design space. Using the same BEM, the increment approach collects samples that fill the same portion. In contrast, using three differently shaped BEMs, the enrichment approach collects samples well-distributed in the design space. The distribution of samples in a training dataset is quantified to assess their potential to improve generalisation. Using the same number of training samples, the enrichment approach fills the design space better than the increment, reducing the generalisation error (root-mean-square-error) by 58%, compared to 38% after the increment. Hence, the article suggests analysing the distribution of existing and prospective samples to identify an efficient data collection approach having a higher potential to improve generalisation. The developed method will be useful to save expensive data collection resources by focussing on a limited number of samples.

AB - Collecting data for machine learning (ML) development is a resource-intensive task that necessitates identifying an efficient data collection approach. This study focuses on ML models that provide quick energy results by dramatically reducing computational demand. The generalisation of such models for multiple building shapes is vital to early-stage energy prediction. Therefore, this article examines which approach of collecting new training samples improves generalisation more - increment of samples in a similar data range or enrichment with samples exhibiting novelty in shape. The first training dataset collects samples from a box-shaped building energy model (BEM). Distribution analysis suggests that they fill only a small portion of the design space. Using the same BEM, the increment approach collects samples that fill the same portion. In contrast, using three differently shaped BEMs, the enrichment approach collects samples well-distributed in the design space. The distribution of samples in a training dataset is quantified to assess their potential to improve generalisation. Using the same number of training samples, the enrichment approach fills the design space better than the increment, reducing the generalisation error (root-mean-square-error) by 58%, compared to 38% after the increment. Hence, the article suggests analysing the distribution of existing and prospective samples to identify an efficient data collection approach having a higher potential to improve generalisation. The developed method will be useful to save expensive data collection resources by focussing on a limited number of samples.

KW - Data analysis

KW - Data collection

KW - Energy performance

KW - Generalisation error

KW - Training data

UR - http://www.scopus.com/inward/record.url?scp=85115664490&partnerID=8YFLogxK

U2 - 10.1016/j.apenergy.2021.117787

DO - 10.1016/j.apenergy.2021.117787

M3 - Article

AN - SCOPUS:85115664490

VL - 304

JO - Applied energy

JF - Applied energy

SN - 0306-2619

M1 - 117787

ER -