MANDO: Multi-Level Heterogeneous Graph Embeddings for Fine-Grained Detection of Smart Contract Vulnerabilities

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

  • Hoang H. Nguyen
  • Nhat Minh Nguyen
  • Chunyao Xie
  • Zahra Ahmadi
  • Daniel Kudendo
  • Thanh Nam Doan
  • Lingxiao Jiang

Research Organisations

External Research Organisations

  • Singapore Management University
View graph of relations

Details

Original languageEnglish
Title of host publication2022 IEEE 9th International Conference on Data Science and Advanced Analytics
Subtitle of host publication(DSAA)
EditorsJoshua Zhexue Huang, Yi Pan, Barbara Hammer, Muhammad Khurram Khan, Xing Xie, Laizhong Cui, Yulin He
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (electronic)9781665473309
ISBN (print)978-1-6654-7331-6
Publication statusPublished - 2022
Event9th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2022 - Shenzhen, China
Duration: 13 Oct 202216 Oct 2022

Abstract

Learning heterogeneous graphs consisting of different types of nodes and edges enhances the results of homogeneous graph techniques. An interesting example of such graphs is control-flow graphs representing possible software code execution flows. As such graphs represent more semantic information of code, developing techniques and tools for such graphs can be highly beneficial for detecting vulnerabilities in software for its reliability. However, existing heterogeneous graph techniques are still insufficient in handling complex graphs where the number of different types of nodes and edges is large and variable. This paper concentrates on the Ethereum smart contracts as a sample of software codes represented by heterogeneous contract graphs built upon both control-flow graphs and call graphs containing different types of nodes and links. We propose MANDO, a new heterogeneous graph representation to learn such heterogeneous contract graphs' structures. MANDO extracts customized meta-paths, which compose relational connections between different types of nodes and their neighbors. Moreover, it develops a multi-metapath heterogeneous graph attention network to learn multi-level embeddings of different types of nodes and their metapaths in the heterogeneous contract graphs, which can capture the code semantics of smart contracts more accurately and facilitate both fine-grained line-level and coarse-grained contract-level vulnerability detection. Our extensive evaluation of large smart contract datasets shows that MANDO improves the vulnerability detection results of other techniques at the coarse-grained contract level. More importantly, it is the first learning-based approach capable of identifying vulnerabilities at the fine-grained line-level, and significantly improves the traditional code analysis-based vulnerability detection approaches by 11.35% to 70.81% in terms of F1-score.

Keywords

    Ethereum blockchain, graph embedding, graph neural networks, heterogeneous graphs, smart contracts, vulnerability detection

ASJC Scopus subject areas

Cite this

MANDO: Multi-Level Heterogeneous Graph Embeddings for Fine-Grained Detection of Smart Contract Vulnerabilities. / Nguyen, Hoang H.; Nguyen, Nhat Minh; Xie, Chunyao et al.
2022 IEEE 9th International Conference on Data Science and Advanced Analytics: (DSAA). ed. / Joshua Zhexue Huang; Yi Pan; Barbara Hammer; Muhammad Khurram Khan; Xing Xie; Laizhong Cui; Yulin He. Institute of Electrical and Electronics Engineers Inc., 2022.

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Nguyen, HH, Nguyen, NM, Xie, C, Ahmadi, Z, Kudendo, D, Doan, TN & Jiang, L 2022, MANDO: Multi-Level Heterogeneous Graph Embeddings for Fine-Grained Detection of Smart Contract Vulnerabilities. in JZ Huang, Y Pan, B Hammer, MK Khan, X Xie, L Cui & Y He (eds), 2022 IEEE 9th International Conference on Data Science and Advanced Analytics: (DSAA). Institute of Electrical and Electronics Engineers Inc., 9th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2022, Shenzhen, China, 13 Oct 2022. https://doi.org/10.1109/DSAA54385.2022.10032337
Nguyen, H. H., Nguyen, N. M., Xie, C., Ahmadi, Z., Kudendo, D., Doan, T. N., & Jiang, L. (2022). MANDO: Multi-Level Heterogeneous Graph Embeddings for Fine-Grained Detection of Smart Contract Vulnerabilities. In J. Z. Huang, Y. Pan, B. Hammer, M. K. Khan, X. Xie, L. Cui, & Y. He (Eds.), 2022 IEEE 9th International Conference on Data Science and Advanced Analytics: (DSAA) Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/DSAA54385.2022.10032337
Nguyen HH, Nguyen NM, Xie C, Ahmadi Z, Kudendo D, Doan TN et al. MANDO: Multi-Level Heterogeneous Graph Embeddings for Fine-Grained Detection of Smart Contract Vulnerabilities. In Huang JZ, Pan Y, Hammer B, Khan MK, Xie X, Cui L, He Y, editors, 2022 IEEE 9th International Conference on Data Science and Advanced Analytics: (DSAA). Institute of Electrical and Electronics Engineers Inc. 2022 doi: 10.1109/DSAA54385.2022.10032337
Nguyen, Hoang H. ; Nguyen, Nhat Minh ; Xie, Chunyao et al. / MANDO : Multi-Level Heterogeneous Graph Embeddings for Fine-Grained Detection of Smart Contract Vulnerabilities. 2022 IEEE 9th International Conference on Data Science and Advanced Analytics: (DSAA). editor / Joshua Zhexue Huang ; Yi Pan ; Barbara Hammer ; Muhammad Khurram Khan ; Xing Xie ; Laizhong Cui ; Yulin He. Institute of Electrical and Electronics Engineers Inc., 2022.
Download
@inproceedings{97532956609f44a784b8a930ca0a9144,
title = "MANDO: Multi-Level Heterogeneous Graph Embeddings for Fine-Grained Detection of Smart Contract Vulnerabilities",
abstract = "Learning heterogeneous graphs consisting of different types of nodes and edges enhances the results of homogeneous graph techniques. An interesting example of such graphs is control-flow graphs representing possible software code execution flows. As such graphs represent more semantic information of code, developing techniques and tools for such graphs can be highly beneficial for detecting vulnerabilities in software for its reliability. However, existing heterogeneous graph techniques are still insufficient in handling complex graphs where the number of different types of nodes and edges is large and variable. This paper concentrates on the Ethereum smart contracts as a sample of software codes represented by heterogeneous contract graphs built upon both control-flow graphs and call graphs containing different types of nodes and links. We propose MANDO, a new heterogeneous graph representation to learn such heterogeneous contract graphs' structures. MANDO extracts customized meta-paths, which compose relational connections between different types of nodes and their neighbors. Moreover, it develops a multi-metapath heterogeneous graph attention network to learn multi-level embeddings of different types of nodes and their metapaths in the heterogeneous contract graphs, which can capture the code semantics of smart contracts more accurately and facilitate both fine-grained line-level and coarse-grained contract-level vulnerability detection. Our extensive evaluation of large smart contract datasets shows that MANDO improves the vulnerability detection results of other techniques at the coarse-grained contract level. More importantly, it is the first learning-based approach capable of identifying vulnerabilities at the fine-grained line-level, and significantly improves the traditional code analysis-based vulnerability detection approaches by 11.35% to 70.81% in terms of F1-score.",
keywords = "Ethereum blockchain, graph embedding, graph neural networks, heterogeneous graphs, smart contracts, vulnerability detection",
author = "Nguyen, {Hoang H.} and Nguyen, {Nhat Minh} and Chunyao Xie and Zahra Ahmadi and Daniel Kudendo and Doan, {Thanh Nam} and Lingxiao Jiang",
note = "Funding Information: This work was supported by the European Unions Horizon 2020 research and innovation program under grant agreement No. 833635 (project ROXANNE: Real-time network, text, and speaker analytics for combating organized crime, 2019-2022) and by the Singapore Ministry of Education (MOE) Academic Research Fund (AcRF) Tier 1 grant ; 9th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2022 ; Conference date: 13-10-2022 Through 16-10-2022",
year = "2022",
doi = "10.1109/DSAA54385.2022.10032337",
language = "English",
isbn = "978-1-6654-7331-6",
editor = "Huang, {Joshua Zhexue} and Yi Pan and Barbara Hammer and Khan, {Muhammad Khurram} and Xing Xie and Laizhong Cui and Yulin He",
booktitle = "2022 IEEE 9th International Conference on Data Science and Advanced Analytics",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

Download

TY - GEN

T1 - MANDO

T2 - 9th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2022

AU - Nguyen, Hoang H.

AU - Nguyen, Nhat Minh

AU - Xie, Chunyao

AU - Ahmadi, Zahra

AU - Kudendo, Daniel

AU - Doan, Thanh Nam

AU - Jiang, Lingxiao

N1 - Funding Information: This work was supported by the European Unions Horizon 2020 research and innovation program under grant agreement No. 833635 (project ROXANNE: Real-time network, text, and speaker analytics for combating organized crime, 2019-2022) and by the Singapore Ministry of Education (MOE) Academic Research Fund (AcRF) Tier 1 grant

PY - 2022

Y1 - 2022

N2 - Learning heterogeneous graphs consisting of different types of nodes and edges enhances the results of homogeneous graph techniques. An interesting example of such graphs is control-flow graphs representing possible software code execution flows. As such graphs represent more semantic information of code, developing techniques and tools for such graphs can be highly beneficial for detecting vulnerabilities in software for its reliability. However, existing heterogeneous graph techniques are still insufficient in handling complex graphs where the number of different types of nodes and edges is large and variable. This paper concentrates on the Ethereum smart contracts as a sample of software codes represented by heterogeneous contract graphs built upon both control-flow graphs and call graphs containing different types of nodes and links. We propose MANDO, a new heterogeneous graph representation to learn such heterogeneous contract graphs' structures. MANDO extracts customized meta-paths, which compose relational connections between different types of nodes and their neighbors. Moreover, it develops a multi-metapath heterogeneous graph attention network to learn multi-level embeddings of different types of nodes and their metapaths in the heterogeneous contract graphs, which can capture the code semantics of smart contracts more accurately and facilitate both fine-grained line-level and coarse-grained contract-level vulnerability detection. Our extensive evaluation of large smart contract datasets shows that MANDO improves the vulnerability detection results of other techniques at the coarse-grained contract level. More importantly, it is the first learning-based approach capable of identifying vulnerabilities at the fine-grained line-level, and significantly improves the traditional code analysis-based vulnerability detection approaches by 11.35% to 70.81% in terms of F1-score.

AB - Learning heterogeneous graphs consisting of different types of nodes and edges enhances the results of homogeneous graph techniques. An interesting example of such graphs is control-flow graphs representing possible software code execution flows. As such graphs represent more semantic information of code, developing techniques and tools for such graphs can be highly beneficial for detecting vulnerabilities in software for its reliability. However, existing heterogeneous graph techniques are still insufficient in handling complex graphs where the number of different types of nodes and edges is large and variable. This paper concentrates on the Ethereum smart contracts as a sample of software codes represented by heterogeneous contract graphs built upon both control-flow graphs and call graphs containing different types of nodes and links. We propose MANDO, a new heterogeneous graph representation to learn such heterogeneous contract graphs' structures. MANDO extracts customized meta-paths, which compose relational connections between different types of nodes and their neighbors. Moreover, it develops a multi-metapath heterogeneous graph attention network to learn multi-level embeddings of different types of nodes and their metapaths in the heterogeneous contract graphs, which can capture the code semantics of smart contracts more accurately and facilitate both fine-grained line-level and coarse-grained contract-level vulnerability detection. Our extensive evaluation of large smart contract datasets shows that MANDO improves the vulnerability detection results of other techniques at the coarse-grained contract level. More importantly, it is the first learning-based approach capable of identifying vulnerabilities at the fine-grained line-level, and significantly improves the traditional code analysis-based vulnerability detection approaches by 11.35% to 70.81% in terms of F1-score.

KW - Ethereum blockchain

KW - graph embedding

KW - graph neural networks

KW - heterogeneous graphs

KW - smart contracts

KW - vulnerability detection

UR - http://www.scopus.com/inward/record.url?scp=85143075291&partnerID=8YFLogxK

U2 - 10.1109/DSAA54385.2022.10032337

DO - 10.1109/DSAA54385.2022.10032337

M3 - Conference contribution

AN - SCOPUS:85143075291

SN - 978-1-6654-7331-6

BT - 2022 IEEE 9th International Conference on Data Science and Advanced Analytics

A2 - Huang, Joshua Zhexue

A2 - Pan, Yi

A2 - Hammer, Barbara

A2 - Khan, Muhammad Khurram

A2 - Xie, Xing

A2 - Cui, Laizhong

A2 - He, Yulin

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 13 October 2022 through 16 October 2022

ER -