Toward a Solution for an Energy Knowledge Graph

Dušan Popadić; Enrique Iglesias; Ahmad Sakor; Valentina Janev; Maria Esther Vidal

doi:10.1007/978-981-19-7126-6_1

Details

Original language	English
Title of host publication	Semantic Intelligence - Select Proceedings of ISIC 2022
Editors	Sarika Jain, Sven Groppe, Bharat K. Bhargava
Place of Publication	Singapore
Pages	3-12
Number of pages	10
ISBN (electronic)	978-981-19-7126-6
Publication status	Published - 1 Apr 2023
Event	2nd International Semantic Intelligence Conference, ISIC 2022 - Savannah, United States Duration: 17 May 2022 → 19 May 2022

Publication series

Name	Lecture Notes in Electrical Engineering
Volume	964
ISSN (Print)	1876-1100
ISSN (electronic)	1876-1119

Abstract

Data integration demands the development of data management techniques to efficiently overcome interoperability issues and provide a harmonized view of both data and their meaning (i.e., metadata). This paper addresses the challenges of energy data management and integration and proposes a process of creating a knowledge graph, motivated by the needs of the stakeholders from Serbia and related to the integration of a large number of different renewable energy sources (RES) with the proprietary SCADA system of the Institute Mihajlo Pupin. The Energy Knowledge Graph (KG) has been built by reusing the energy-based semantic data model and the SDM-RDFizer, an open-source tool and interpreter of the W3C Recommendations Standard R2RML and its RDF Mapping Language (RML) extension. The data connectors implemented by the SDM-RDFizer plan the execution of the mapping rules and loading of the dataset to an RDF triple store to speed up the process of knowledge base creation. The Energy KG has been deployed on a Smart Grid Architecture Model (SGAM)—compliant platform hosted at the Institute Mihajlo Pupin.

Keywords

Application, Energy, Knowledge graph, Mapping rules, Services

ASJC Scopus subject areas

Engineering(all)
Industrial and Manufacturing Engineering

Sustainable Development Goals

SDG 7 - Affordable and Clean Energy

Cite this

Toward a Solution for an Energy Knowledge Graph. / Popadić, Dušan; Iglesias, Enrique; Sakor, Ahmad et al.
Semantic Intelligence - Select Proceedings of ISIC 2022. ed. / Sarika Jain; Sven Groppe; Bharat K. Bhargava. Singapore, 2023. p. 3-12 (Lecture Notes in Electrical Engineering; Vol. 964).

Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review

Popadić, D, Iglesias, E, Sakor, A, Janev, V & Vidal, ME 2023, Toward a Solution for an Energy Knowledge Graph. in S Jain, S Groppe & BK Bhargava (eds), Semantic Intelligence - Select Proceedings of ISIC 2022. Lecture Notes in Electrical Engineering, vol. 964, Singapore, pp. 3-12, 2nd International Semantic Intelligence Conference, ISIC 2022, Savannah, Georgia, United States, 17 May 2022. https://doi.org/10.1007/978-981-19-7126-6_1

Popadić, D., Iglesias, E., Sakor, A., Janev, V., & Vidal, M. E. (2023). Toward a Solution for an Energy Knowledge Graph. In S. Jain, S. Groppe, & B. K. Bhargava (Eds.), Semantic Intelligence - Select Proceedings of ISIC 2022 (pp. 3-12). (Lecture Notes in Electrical Engineering; Vol. 964).. https://doi.org/10.1007/978-981-19-7126-6_1

Popadić D, Iglesias E, Sakor A, Janev V, Vidal ME. Toward a Solution for an Energy Knowledge Graph. In Jain S, Groppe S, Bhargava BK, editors, Semantic Intelligence - Select Proceedings of ISIC 2022. Singapore. 2023. p. 3-12. (Lecture Notes in Electrical Engineering). doi: 10.1007/978-981-19-7126-6_1

Popadić, Dušan ; Iglesias, Enrique ; Sakor, Ahmad et al. / Toward a Solution for an Energy Knowledge Graph. Semantic Intelligence - Select Proceedings of ISIC 2022. editor / Sarika Jain ; Sven Groppe ; Bharat K. Bhargava. Singapore, 2023. pp. 3-12 (Lecture Notes in Electrical Engineering).

Download

@inproceedings{3a31189dbd3f406cbc13324f43f37fce,

title = "Toward a Solution for an Energy Knowledge Graph",

abstract = "Data integration demands the development of data management techniques to efficiently overcome interoperability issues and provide a harmonized view of both data and their meaning (i.e., metadata). This paper addresses the challenges of energy data management and integration and proposes a process of creating a knowledge graph, motivated by the needs of the stakeholders from Serbia and related to the integration of a large number of different renewable energy sources (RES) with the proprietary SCADA system of the Institute Mihajlo Pupin. The Energy Knowledge Graph (KG) has been built by reusing the energy-based semantic data model and the SDM-RDFizer, an open-source tool and interpreter of the W3C Recommendations Standard R2RML and its RDF Mapping Language (RML) extension. The data connectors implemented by the SDM-RDFizer plan the execution of the mapping rules and loading of the dataset to an RDF triple store to speed up the process of knowledge base creation. The Energy KG has been deployed on a Smart Grid Architecture Model (SGAM)—compliant platform hosted at the Institute Mihajlo Pupin.",

keywords = "Application, Energy, Knowledge graph, Mapping rules, Services",

author = "Du{\v s}an Popadi{\'c} and Enrique Iglesias and Ahmad Sakor and Valentina Janev and Vidal, {Maria Esther}",

note = "Funding Information: DCAT—The Data Catalog Vocabulary (DCAT, https://www.w3.org/TR/vocab-dcat-2) provides a common understanding of the classes and properties that describe a catalog of datasets and data services. DCAT is expressed in RDF and provides unified representation of catalog properties in a way that is understand-able by humans, and also by machines. DCAT includes also classes from other vocabularies, e.g., foaf:Agent, skos:Concept, or skos:ConceptSchema. 2.3 Methodology The work has been divided into the following phases: Requirement Analysis phase: The authors defined different business questions that we would like to answer with the knowledge graph. Design phase: Relevant concepts are selected for modeling. Then, data connectors toward the SCADA database and the messaging mechanisms are specified. Specification phase: The knowledge graph is specified in terms of RML rules. KGs in Action phase: The authors are involved in automating the semantic pipeline and developing exploration GUIs. In this paper, the authors focus on the last two activities, namely, the implementation of the semantic pipeline and the exploration of the knowledge graph (Fig. 2). 3 Knowledge Graph Creation—The Semantic Pipeline This section describes the process of knowledge graph creation and highlights the main challenges tackled in the work reported in this article. The process (injection, Scenarios Scenarios Scenarios Requirements Analysis Business ques ons Data sources Data exchange needs KGs Design Seman c models Data connectors Message specifica on and security KGs Specifica on Ontology formaliza on Mapping rules APIs Specifica on KGs In Ac on GUIs Implement. Seman c pipeline automa on APIs Implement. Fig. 2 Four-step methodology transformation, and integration), also known as a semantic pipeline, is given in Fig. 3. There are two types of knowledge graph creation strategies: The Materialized Knowledge Graph Creation Process, i.e., data warehousing where the data are loaded and stored in an RDF format in a physical database, e.g., Virtuoso RDF triplestore. The Virtual Knowledge Graph Creation Process (i.e., Data Lake) where the data remains in the sources (in raw format) and is accessed as needed during query time. We follow the first approach in order to experiment with (1) mechanisms for efficient search and visualization of energy data at different levels of granularity and (2) provide mechanisms for explainability and interpretability of results of the analytical services. The correspondences among energy data sources and semantic models are described using two mapping languages R2RML and RML, namely, the Relational to RDF Mapping Language (R2RML) [15] and the RDF Mapping Language (RML) [16]. As a result of the execution of R2RML and RML mapping rules, a knowledge graph expressed in RDF is created. Mapping rules are expressed as triples maps. Each triples map refers to a single logical source which can be SQL table or view or data gathered by executing SQL query against the input database. In our case, the mapping rules are applied to transform static data about plants, generation units, and weather stations, see Appendix. This data includes geographical location, control area membership, and similar data that are not changed frequently. Following examples of the mapping rules focus on PV plants. Since some of the data already exists in a MySQL database, this data is converted to RDF format using the RML-complaint engine, SDM-RDFizer [17]; it executes R2RML and RML mapping rules and transforms raw data in various formats: CSV, JSON, RDB, and XML, into an RDF graph knowledge graph. SDM-RDFizer resorts to data structures and physical Fig. 3 Semantic pipeline Table 1 The SCADA KG statistics Statistics Value Total number of RDF triples 18,278,850 Number of classes 83 Number of distinct properties 156 Number of class/subclass pairs 12 Number of different timestamps for timestamped data 1,108,298 operators to scale up to large datasets, physical operators, and efficiently execute pipelines of knowledge graph creation. Apart from static information about power plants and the grid, measured values from power plants are also collected. The data collected through the SCADA system is available in real time through a MySQL database; it includes power production forecast, power production measurements, and weather information (e.g., air temperature, wind direction, and solar panel temperature). The SCADA knowledge graph (KG) is created as a result of execution of the mapping rules on top of the MySQL. By the time of this submission, the SCADA KG comprises more than 18M RDF triples, with instances of 83 classes. These classes are described in terms of 156 properties and more than 1M timestamps. Table 1 reports on the characteristics of the current version of the SCADA KG. 4 Knowledge Graph Exploitation This section presents the services implemented on top of the SCADA KG; they allow for the exploration of the integrated data and their descriptions with the energy semantic data models. SPARQL, the W3C recommendation query language, is utilized to express basic queries against the SCADA KG. 4.1 Energy Analytics Dashboard Since SCADA KG shall work in synergy with various AI-based analytic services and help users to understand results, a visualization tool (EAD—energy analytics dash-board) has been developed. The tool allows fetching data from arbitrary SPARQL end points and supports different analysis/visualization options. EAD is a data visualization tool that works on top of the SCADA KG. It allows the users to select the data of interest, compare time series (i.e., forecasted load and actual load at that time), and visualize summary statistics on the geographical map. It has been implemented as a web application using JavaScript programming language Fig. 4 Semantic pipeline and KG exploration with help of JQuery library. It uses the Highcharts library for visualization (https:// github.com/highcharts/highcharts) and the Leaflet library for interacting with geo data (https://leafletjs.com/). Figure 4 depicts the dashboard and its connection with the pipeline of knowledge graph creation is described in Sect. 3. 4.2 Alignment with EU Initiatives In order to inherently address interoperability, the CEN-CENELEC-ETSI Smart Grid Reference Architecture [10] framework defines five interoperability layers, where the information layer specifies the business context and the semantic understanding. Hence, in future energy smart grids, the technologies described herein are not optional, but mandatory. Currently, under development represents different energy services marketplaces that in their core include components such as vocabulary management tools and dataset/service registries. In case the production of all PV plants in Serbia can be reached via a SPARQL query, with one click, we can answer the following question “Show the total energy produced by PV plants in Serbia”. SELECT DISTINCT ?solararray SUM(?value) as ?totalPower WHERE { ?solararray a seas:SolarArray . ?solararray art:country <https://projekat-artemis.rs/Country/RS>. ?panel seas:isMemberOf ?solararray . ?panel a seas:SolarPanel . ?panel seas:producedElectricPower ?activePowerProperty . } 5 Conclusions The reusability of energy services is limited due to different representations of data used by different stakeholders in the energy value chain. Therefore, this paper proposes an approach for building a knowledge graph enabling semantic interoperability. The semantic data models from the energy sector and the internal SCADA information model are currently used as an information hub materialized in a knowledge graph. It provides the basis for developing and integrating services in the Energy Data Spaces. Additionally, this layer provides the basis for the explainability of machine learning services/analytical applications installed in the smart ecosystem. The future work includes activities that will connect the PUPIN platform with the PLATOON marketplace, thus creating opportunities for broader exploitation of the PUPIN analytical services. Acknowledgements This research has received funding from EU H2020 Research Program (GA No. 872592, GA No. 952140) and the Republic of Serbia (MPN, No. 451-03-9/2021-14/200034; Innov. Fund, Artemis, No. 6527051). Appendix @base <https://projekat-artemis.rs/> . <#ARTEMIS_DB> a d2rq:Database; <#PUPIN_PVPlantMapping> a rr:TriplesMap; rml:logicalSource [ rml:source <#ARTEMIS_DB>; rr:sqlVersion rr:SQL2008; rml:query {"}{"}{"} SELECT DISTINCT plants.id AS plant_id, plants.name AS plant_name, weater_locations.lat AS lat, weather_locations.lon AS lon, weather_locations.city AS city, assets.asset_name AS asset_name, country.country_code AS ccode,eic_functions.eic_type_function_acronym AS eic_func_acronym, organization.organization_short_name AS organization_short_name, organization.organization_name AS organization_name,assets.id AS asset_id FROM `plants` JOIN weather_locations ON plants.weather_location_id = weather_locations.id JOIN assets ON plants.asset_id = assets.id JOIN organization ON assets.organization_id = organization.id WHERE {"}{"}{"} ]; ; 2nd International Semantic Intelligence Conference, ISIC 2022 ; Conference date: 17-05-2022 Through 19-05-2022",

year = "2023",

month = apr,

day = "1",

doi = "10.1007/978-981-19-7126-6_1",

language = "English",

isbn = "9789811971259",

series = "Lecture Notes in Electrical Engineering",

pages = "3--12",

editor = "Sarika Jain and Sven Groppe and Bhargava, {Bharat K.}",

booktitle = "Semantic Intelligence - Select Proceedings of ISIC 2022",

}

Download

TY - GEN

T1 - Toward a Solution for an Energy Knowledge Graph

AU - Popadić, Dušan

AU - Iglesias, Enrique

AU - Sakor, Ahmad

AU - Janev, Valentina

AU - Vidal, Maria Esther

N1 - Funding Information: DCAT—The Data Catalog Vocabulary (DCAT, https://www.w3.org/TR/vocab-dcat-2) provides a common understanding of the classes and properties that describe a catalog of datasets and data services. DCAT is expressed in RDF and provides unified representation of catalog properties in a way that is understand-able by humans, and also by machines. DCAT includes also classes from other vocabularies, e.g., foaf:Agent, skos:Concept, or skos:ConceptSchema. 2.3 Methodology The work has been divided into the following phases: Requirement Analysis phase: The authors defined different business questions that we would like to answer with the knowledge graph. Design phase: Relevant concepts are selected for modeling. Then, data connectors toward the SCADA database and the messaging mechanisms are specified. Specification phase: The knowledge graph is specified in terms of RML rules. KGs in Action phase: The authors are involved in automating the semantic pipeline and developing exploration GUIs. In this paper, the authors focus on the last two activities, namely, the implementation of the semantic pipeline and the exploration of the knowledge graph (Fig. 2). 3 Knowledge Graph Creation—The Semantic Pipeline This section describes the process of knowledge graph creation and highlights the main challenges tackled in the work reported in this article. The process (injection, Scenarios Scenarios Scenarios Requirements Analysis Business ques ons Data sources Data exchange needs KGs Design Seman c models Data connectors Message specifica on and security KGs Specifica on Ontology formaliza on Mapping rules APIs Specifica on KGs In Ac on GUIs Implement. Seman c pipeline automa on APIs Implement. Fig. 2 Four-step methodology transformation, and integration), also known as a semantic pipeline, is given in Fig. 3. There are two types of knowledge graph creation strategies: The Materialized Knowledge Graph Creation Process, i.e., data warehousing where the data are loaded and stored in an RDF format in a physical database, e.g., Virtuoso RDF triplestore. The Virtual Knowledge Graph Creation Process (i.e., Data Lake) where the data remains in the sources (in raw format) and is accessed as needed during query time. We follow the first approach in order to experiment with (1) mechanisms for efficient search and visualization of energy data at different levels of granularity and (2) provide mechanisms for explainability and interpretability of results of the analytical services. The correspondences among energy data sources and semantic models are described using two mapping languages R2RML and RML, namely, the Relational to RDF Mapping Language (R2RML) [15] and the RDF Mapping Language (RML) [16]. As a result of the execution of R2RML and RML mapping rules, a knowledge graph expressed in RDF is created. Mapping rules are expressed as triples maps. Each triples map refers to a single logical source which can be SQL table or view or data gathered by executing SQL query against the input database. In our case, the mapping rules are applied to transform static data about plants, generation units, and weather stations, see Appendix. This data includes geographical location, control area membership, and similar data that are not changed frequently. Following examples of the mapping rules focus on PV plants. Since some of the data already exists in a MySQL database, this data is converted to RDF format using the RML-complaint engine, SDM-RDFizer [17]; it executes R2RML and RML mapping rules and transforms raw data in various formats: CSV, JSON, RDB, and XML, into an RDF graph knowledge graph. SDM-RDFizer resorts to data structures and physical Fig. 3 Semantic pipeline Table 1 The SCADA KG statistics Statistics Value Total number of RDF triples 18,278,850 Number of classes 83 Number of distinct properties 156 Number of class/subclass pairs 12 Number of different timestamps for timestamped data 1,108,298 operators to scale up to large datasets, physical operators, and efficiently execute pipelines of knowledge graph creation. Apart from static information about power plants and the grid, measured values from power plants are also collected. The data collected through the SCADA system is available in real time through a MySQL database; it includes power production forecast, power production measurements, and weather information (e.g., air temperature, wind direction, and solar panel temperature). The SCADA knowledge graph (KG) is created as a result of execution of the mapping rules on top of the MySQL. By the time of this submission, the SCADA KG comprises more than 18M RDF triples, with instances of 83 classes. These classes are described in terms of 156 properties and more than 1M timestamps. Table 1 reports on the characteristics of the current version of the SCADA KG. 4 Knowledge Graph Exploitation This section presents the services implemented on top of the SCADA KG; they allow for the exploration of the integrated data and their descriptions with the energy semantic data models. SPARQL, the W3C recommendation query language, is utilized to express basic queries against the SCADA KG. 4.1 Energy Analytics Dashboard Since SCADA KG shall work in synergy with various AI-based analytic services and help users to understand results, a visualization tool (EAD—energy analytics dash-board) has been developed. The tool allows fetching data from arbitrary SPARQL end points and supports different analysis/visualization options. EAD is a data visualization tool that works on top of the SCADA KG. It allows the users to select the data of interest, compare time series (i.e., forecasted load and actual load at that time), and visualize summary statistics on the geographical map. It has been implemented as a web application using JavaScript programming language Fig. 4 Semantic pipeline and KG exploration with help of JQuery library. It uses the Highcharts library for visualization (https:// github.com/highcharts/highcharts) and the Leaflet library for interacting with geo data (https://leafletjs.com/). Figure 4 depicts the dashboard and its connection with the pipeline of knowledge graph creation is described in Sect. 3. 4.2 Alignment with EU Initiatives In order to inherently address interoperability, the CEN-CENELEC-ETSI Smart Grid Reference Architecture [10] framework defines five interoperability layers, where the information layer specifies the business context and the semantic understanding. Hence, in future energy smart grids, the technologies described herein are not optional, but mandatory. Currently, under development represents different energy services marketplaces that in their core include components such as vocabulary management tools and dataset/service registries. In case the production of all PV plants in Serbia can be reached via a SPARQL query, with one click, we can answer the following question “Show the total energy produced by PV plants in Serbia”. SELECT DISTINCT ?solararray SUM(?value) as ?totalPower WHERE { ?solararray a seas:SolarArray . ?solararray art:country <https://projekat-artemis.rs/Country/RS>. ?panel seas:isMemberOf ?solararray . ?panel a seas:SolarPanel . ?panel seas:producedElectricPower ?activePowerProperty . } 5 Conclusions The reusability of energy services is limited due to different representations of data used by different stakeholders in the energy value chain. Therefore, this paper proposes an approach for building a knowledge graph enabling semantic interoperability. The semantic data models from the energy sector and the internal SCADA information model are currently used as an information hub materialized in a knowledge graph. It provides the basis for developing and integrating services in the Energy Data Spaces. Additionally, this layer provides the basis for the explainability of machine learning services/analytical applications installed in the smart ecosystem. The future work includes activities that will connect the PUPIN platform with the PLATOON marketplace, thus creating opportunities for broader exploitation of the PUPIN analytical services. Acknowledgements This research has received funding from EU H2020 Research Program (GA No. 872592, GA No. 952140) and the Republic of Serbia (MPN, No. 451-03-9/2021-14/200034; Innov. Fund, Artemis, No. 6527051). Appendix @base <https://projekat-artemis.rs/> . <#ARTEMIS_DB> a d2rq:Database; <#PUPIN_PVPlantMapping> a rr:TriplesMap; rml:logicalSource [ rml:source <#ARTEMIS_DB>; rr:sqlVersion rr:SQL2008; rml:query """ SELECT DISTINCT plants.id AS plant_id, plants.name AS plant_name, weater_locations.lat AS lat, weather_locations.lon AS lon, weather_locations.city AS city, assets.asset_name AS asset_name, country.country_code AS ccode,eic_functions.eic_type_function_acronym AS eic_func_acronym, organization.organization_short_name AS organization_short_name, organization.organization_name AS organization_name,assets.id AS asset_id FROM `plants` JOIN weather_locations ON plants.weather_location_id = weather_locations.id JOIN assets ON plants.asset_id = assets.id JOIN organization ON assets.organization_id = organization.id WHERE """ ];

PY - 2023/4/1

Y1 - 2023/4/1

N2 - Data integration demands the development of data management techniques to efficiently overcome interoperability issues and provide a harmonized view of both data and their meaning (i.e., metadata). This paper addresses the challenges of energy data management and integration and proposes a process of creating a knowledge graph, motivated by the needs of the stakeholders from Serbia and related to the integration of a large number of different renewable energy sources (RES) with the proprietary SCADA system of the Institute Mihajlo Pupin. The Energy Knowledge Graph (KG) has been built by reusing the energy-based semantic data model and the SDM-RDFizer, an open-source tool and interpreter of the W3C Recommendations Standard R2RML and its RDF Mapping Language (RML) extension. The data connectors implemented by the SDM-RDFizer plan the execution of the mapping rules and loading of the dataset to an RDF triple store to speed up the process of knowledge base creation. The Energy KG has been deployed on a Smart Grid Architecture Model (SGAM)—compliant platform hosted at the Institute Mihajlo Pupin.

AB - Data integration demands the development of data management techniques to efficiently overcome interoperability issues and provide a harmonized view of both data and their meaning (i.e., metadata). This paper addresses the challenges of energy data management and integration and proposes a process of creating a knowledge graph, motivated by the needs of the stakeholders from Serbia and related to the integration of a large number of different renewable energy sources (RES) with the proprietary SCADA system of the Institute Mihajlo Pupin. The Energy Knowledge Graph (KG) has been built by reusing the energy-based semantic data model and the SDM-RDFizer, an open-source tool and interpreter of the W3C Recommendations Standard R2RML and its RDF Mapping Language (RML) extension. The data connectors implemented by the SDM-RDFizer plan the execution of the mapping rules and loading of the dataset to an RDF triple store to speed up the process of knowledge base creation. The Energy KG has been deployed on a Smart Grid Architecture Model (SGAM)—compliant platform hosted at the Institute Mihajlo Pupin.

KW - Application

KW - Energy

KW - Knowledge graph

KW - Mapping rules

KW - Services

UR - http://www.scopus.com/inward/record.url?scp=85152540385&partnerID=8YFLogxK

U2 - 10.1007/978-981-19-7126-6_1

DO - 10.1007/978-981-19-7126-6_1

M3 - Conference contribution

AN - SCOPUS:85152540385

SN - 9789811971259

T3 - Lecture Notes in Electrical Engineering

SP - 3

EP - 12

BT - Semantic Intelligence - Select Proceedings of ISIC 2022

A2 - Jain, Sarika

A2 - Groppe, Sven

A2 - Bhargava, Bharat K.

CY - Singapore

T2 - 2nd International Semantic Intelligence Conference, ISIC 2022

Y2 - 17 May 2022 through 19 May 2022

ER -

Research@Leibniz University

Toward a Solution for an Energy Knowledge Graph

Authors

Research Organisations

External Research Organisations