MANDO-HGT: Heterogeneous Graph Transformers for Smart Contract Vulnerability Detection

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

  • Hoang H. Nguyen
  • Nhat Minh Nguyen
  • Chunyao Xie
  • Zahra Ahmadi
  • Daniel Kudendo
  • Thanh Nam Doan
  • Lingxiao Jiang

Research Organisations

External Research Organisations

  • Singapore Management University
View graph of relations

Details

Original languageEnglish
Title of host publication2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)
Subtitle of host publicationMSR
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages334-346
Number of pages13
ISBN (electronic)9798350311846
ISBN (print)979-8-3503-1185-3
Publication statusPublished - 2023
Event20th IEEE/ACM International Conference on Mining Software Repositories, MSR 2023 - Melbourne, Australia
Duration: 15 May 202316 May 2023

Abstract

Smart contracts in blockchains have been increasingly used for high-value business applications. It is essential to check smart contracts' reliability before and after deployment. Although various program analysis and deep learning techniques have been proposed to detect vulnerabilities in either Ethereum smart contract source code or bytecode, their detection accuracy and scalability are still limited. This paper presents a novel framework named MANDO-HGT for detecting smart contract vulnerabilities. Given Ethereum smart contracts, either in source code or bytecode form, and vulnerable or clean, MANDO-HGT custom-builds heterogeneous contract graphs (HCGs) to represent control-flow and/or function-call information of the code. It then adapts heterogeneous graph transformers (HGTs) with customized meta relations for graph nodes and edges to learn their embeddings and train classifiers for detecting various vulnerability types in the nodes and graphs of the contracts more accurately. We have collected more than 55K Ethereum smart contracts from various data sources and verified the labels for 423 buggy and 2,742 clean contracts to evaluate MANDO-HGT. Our empirical results show that MANDO-HGT can significantly improve the detection accuracy of other state-of-the-art vulnerability detection techniques that are based on either machine learning or conventional analysis techniques. The accuracy improvements in terms of F1-score range from 0.7% to more than 76% at either the coarse-grained contract level or the fine-grained line level for various vulnerability types in either source code or bytecode. Our method is general and can be retrained easily for different vulnerability types without the need for manually defined vulnerability patterns.

Keywords

    bytecode, graph transformer, heterogeneous graph learning, smart contracts, source code, vulnerability detection

ASJC Scopus subject areas

Cite this

MANDO-HGT: Heterogeneous Graph Transformers for Smart Contract Vulnerability Detection. / Nguyen, Hoang H.; Nguyen, Nhat Minh; Xie, Chunyao et al.
2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR): MSR. Institute of Electrical and Electronics Engineers Inc., 2023. p. 334-346.

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Nguyen, HH, Nguyen, NM, Xie, C, Ahmadi, Z, Kudendo, D, Doan, TN & Jiang, L 2023, MANDO-HGT: Heterogeneous Graph Transformers for Smart Contract Vulnerability Detection. in 2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR): MSR. Institute of Electrical and Electronics Engineers Inc., pp. 334-346, 20th IEEE/ACM International Conference on Mining Software Repositories, MSR 2023, Melbourne, Australia, 15 May 2023. https://doi.org/10.1109/MSR59073.2023.00052
Nguyen, H. H., Nguyen, N. M., Xie, C., Ahmadi, Z., Kudendo, D., Doan, T. N., & Jiang, L. (2023). MANDO-HGT: Heterogeneous Graph Transformers for Smart Contract Vulnerability Detection. In 2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR): MSR (pp. 334-346). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/MSR59073.2023.00052
Nguyen HH, Nguyen NM, Xie C, Ahmadi Z, Kudendo D, Doan TN et al. MANDO-HGT: Heterogeneous Graph Transformers for Smart Contract Vulnerability Detection. In 2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR): MSR. Institute of Electrical and Electronics Engineers Inc. 2023. p. 334-346 doi: 10.1109/MSR59073.2023.00052
Nguyen, Hoang H. ; Nguyen, Nhat Minh ; Xie, Chunyao et al. / MANDO-HGT : Heterogeneous Graph Transformers for Smart Contract Vulnerability Detection. 2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR): MSR. Institute of Electrical and Electronics Engineers Inc., 2023. pp. 334-346
Download
@inproceedings{50ce5e90b3ca4f3ebc4574ee1473e8b4,
title = "MANDO-HGT: Heterogeneous Graph Transformers for Smart Contract Vulnerability Detection",
abstract = "Smart contracts in blockchains have been increasingly used for high-value business applications. It is essential to check smart contracts' reliability before and after deployment. Although various program analysis and deep learning techniques have been proposed to detect vulnerabilities in either Ethereum smart contract source code or bytecode, their detection accuracy and scalability are still limited. This paper presents a novel framework named MANDO-HGT for detecting smart contract vulnerabilities. Given Ethereum smart contracts, either in source code or bytecode form, and vulnerable or clean, MANDO-HGT custom-builds heterogeneous contract graphs (HCGs) to represent control-flow and/or function-call information of the code. It then adapts heterogeneous graph transformers (HGTs) with customized meta relations for graph nodes and edges to learn their embeddings and train classifiers for detecting various vulnerability types in the nodes and graphs of the contracts more accurately. We have collected more than 55K Ethereum smart contracts from various data sources and verified the labels for 423 buggy and 2,742 clean contracts to evaluate MANDO-HGT. Our empirical results show that MANDO-HGT can significantly improve the detection accuracy of other state-of-the-art vulnerability detection techniques that are based on either machine learning or conventional analysis techniques. The accuracy improvements in terms of F1-score range from 0.7% to more than 76% at either the coarse-grained contract level or the fine-grained line level for various vulnerability types in either source code or bytecode. Our method is general and can be retrained easily for different vulnerability types without the need for manually defined vulnerability patterns.",
keywords = "bytecode, graph transformer, heterogeneous graph learning, smart contracts, source code, vulnerability detection",
author = "Nguyen, {Hoang H.} and Nguyen, {Nhat Minh} and Chunyao Xie and Zahra Ahmadi and Daniel Kudendo and Doan, {Thanh Nam} and Lingxiao Jiang",
note = "Funding Information: Acknowledgments. This work was supported by the European Union{\textquoteright}s Horizon 2020 research and innovation program under grant agreement No. 833635 (project ROXANNE: Real-time network, text, and speaker analytics for combating organized crime, 2019-2022) and by the Singapore Ministry of Education (MOE) Academic Research Fund (AcRF) Tier 1 grant and the Lee Kong Chian Fellowship. Any opinions, findings and conclusions, or recommendations expressed in this material are those of the authors and do not reflect the views of any of the grantors. We also thank all the anonymous reviewers for their insightful feedback on our paper. ; 20th IEEE/ACM International Conference on Mining Software Repositories, MSR 2023 ; Conference date: 15-05-2023 Through 16-05-2023",
year = "2023",
doi = "10.1109/MSR59073.2023.00052",
language = "English",
isbn = "979-8-3503-1185-3",
pages = "334--346",
booktitle = "2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

Download

TY - GEN

T1 - MANDO-HGT

T2 - 20th IEEE/ACM International Conference on Mining Software Repositories, MSR 2023

AU - Nguyen, Hoang H.

AU - Nguyen, Nhat Minh

AU - Xie, Chunyao

AU - Ahmadi, Zahra

AU - Kudendo, Daniel

AU - Doan, Thanh Nam

AU - Jiang, Lingxiao

N1 - Funding Information: Acknowledgments. This work was supported by the European Union’s Horizon 2020 research and innovation program under grant agreement No. 833635 (project ROXANNE: Real-time network, text, and speaker analytics for combating organized crime, 2019-2022) and by the Singapore Ministry of Education (MOE) Academic Research Fund (AcRF) Tier 1 grant and the Lee Kong Chian Fellowship. Any opinions, findings and conclusions, or recommendations expressed in this material are those of the authors and do not reflect the views of any of the grantors. We also thank all the anonymous reviewers for their insightful feedback on our paper.

PY - 2023

Y1 - 2023

N2 - Smart contracts in blockchains have been increasingly used for high-value business applications. It is essential to check smart contracts' reliability before and after deployment. Although various program analysis and deep learning techniques have been proposed to detect vulnerabilities in either Ethereum smart contract source code or bytecode, their detection accuracy and scalability are still limited. This paper presents a novel framework named MANDO-HGT for detecting smart contract vulnerabilities. Given Ethereum smart contracts, either in source code or bytecode form, and vulnerable or clean, MANDO-HGT custom-builds heterogeneous contract graphs (HCGs) to represent control-flow and/or function-call information of the code. It then adapts heterogeneous graph transformers (HGTs) with customized meta relations for graph nodes and edges to learn their embeddings and train classifiers for detecting various vulnerability types in the nodes and graphs of the contracts more accurately. We have collected more than 55K Ethereum smart contracts from various data sources and verified the labels for 423 buggy and 2,742 clean contracts to evaluate MANDO-HGT. Our empirical results show that MANDO-HGT can significantly improve the detection accuracy of other state-of-the-art vulnerability detection techniques that are based on either machine learning or conventional analysis techniques. The accuracy improvements in terms of F1-score range from 0.7% to more than 76% at either the coarse-grained contract level or the fine-grained line level for various vulnerability types in either source code or bytecode. Our method is general and can be retrained easily for different vulnerability types without the need for manually defined vulnerability patterns.

AB - Smart contracts in blockchains have been increasingly used for high-value business applications. It is essential to check smart contracts' reliability before and after deployment. Although various program analysis and deep learning techniques have been proposed to detect vulnerabilities in either Ethereum smart contract source code or bytecode, their detection accuracy and scalability are still limited. This paper presents a novel framework named MANDO-HGT for detecting smart contract vulnerabilities. Given Ethereum smart contracts, either in source code or bytecode form, and vulnerable or clean, MANDO-HGT custom-builds heterogeneous contract graphs (HCGs) to represent control-flow and/or function-call information of the code. It then adapts heterogeneous graph transformers (HGTs) with customized meta relations for graph nodes and edges to learn their embeddings and train classifiers for detecting various vulnerability types in the nodes and graphs of the contracts more accurately. We have collected more than 55K Ethereum smart contracts from various data sources and verified the labels for 423 buggy and 2,742 clean contracts to evaluate MANDO-HGT. Our empirical results show that MANDO-HGT can significantly improve the detection accuracy of other state-of-the-art vulnerability detection techniques that are based on either machine learning or conventional analysis techniques. The accuracy improvements in terms of F1-score range from 0.7% to more than 76% at either the coarse-grained contract level or the fine-grained line level for various vulnerability types in either source code or bytecode. Our method is general and can be retrained easily for different vulnerability types without the need for manually defined vulnerability patterns.

KW - bytecode

KW - graph transformer

KW - heterogeneous graph learning

KW - smart contracts

KW - source code

KW - vulnerability detection

UR - http://www.scopus.com/inward/record.url?scp=85166351291&partnerID=8YFLogxK

U2 - 10.1109/MSR59073.2023.00052

DO - 10.1109/MSR59073.2023.00052

M3 - Conference contribution

AN - SCOPUS:85166351291

SN - 979-8-3503-1185-3

SP - 334

EP - 346

BT - 2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 15 May 2023 through 16 May 2023

ER -