Details
Original language | English |
---|---|
Title of host publication | 2022 IEEE 9th International Conference on Data Science and Advanced Analytics |
Subtitle of host publication | (DSAA) |
Editors | Joshua Zhexue Huang, Yi Pan, Barbara Hammer, Muhammad Khurram Khan, Xing Xie, Laizhong Cui, Yulin He |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
ISBN (electronic) | 9781665473309 |
ISBN (print) | 978-1-6654-7331-6 |
Publication status | Published - 2022 |
Event | 9th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2022 - Shenzhen, China Duration: 13 Oct 2022 → 16 Oct 2022 |
Abstract
Learning heterogeneous graphs consisting of different types of nodes and edges enhances the results of homogeneous graph techniques. An interesting example of such graphs is control-flow graphs representing possible software code execution flows. As such graphs represent more semantic information of code, developing techniques and tools for such graphs can be highly beneficial for detecting vulnerabilities in software for its reliability. However, existing heterogeneous graph techniques are still insufficient in handling complex graphs where the number of different types of nodes and edges is large and variable. This paper concentrates on the Ethereum smart contracts as a sample of software codes represented by heterogeneous contract graphs built upon both control-flow graphs and call graphs containing different types of nodes and links. We propose MANDO, a new heterogeneous graph representation to learn such heterogeneous contract graphs' structures. MANDO extracts customized meta-paths, which compose relational connections between different types of nodes and their neighbors. Moreover, it develops a multi-metapath heterogeneous graph attention network to learn multi-level embeddings of different types of nodes and their metapaths in the heterogeneous contract graphs, which can capture the code semantics of smart contracts more accurately and facilitate both fine-grained line-level and coarse-grained contract-level vulnerability detection. Our extensive evaluation of large smart contract datasets shows that MANDO improves the vulnerability detection results of other techniques at the coarse-grained contract level. More importantly, it is the first learning-based approach capable of identifying vulnerabilities at the fine-grained line-level, and significantly improves the traditional code analysis-based vulnerability detection approaches by 11.35% to 70.81% in terms of F1-score.
Keywords
- Ethereum blockchain, graph embedding, graph neural networks, heterogeneous graphs, smart contracts, vulnerability detection
ASJC Scopus subject areas
- Computer Science(all)
- Artificial Intelligence
- Computer Science(all)
- Computer Vision and Pattern Recognition
- Computer Science(all)
- Hardware and Architecture
- Computer Science(all)
- Information Systems
- Decision Sciences(all)
- Information Systems and Management
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
2022 IEEE 9th International Conference on Data Science and Advanced Analytics: (DSAA). ed. / Joshua Zhexue Huang; Yi Pan; Barbara Hammer; Muhammad Khurram Khan; Xing Xie; Laizhong Cui; Yulin He. Institute of Electrical and Electronics Engineers Inc., 2022.
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review
}
TY - GEN
T1 - MANDO
T2 - 9th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2022
AU - Nguyen, Hoang H.
AU - Nguyen, Nhat Minh
AU - Xie, Chunyao
AU - Ahmadi, Zahra
AU - Kudendo, Daniel
AU - Doan, Thanh Nam
AU - Jiang, Lingxiao
N1 - Funding Information: This work was supported by the European Unions Horizon 2020 research and innovation program under grant agreement No. 833635 (project ROXANNE: Real-time network, text, and speaker analytics for combating organized crime, 2019-2022) and by the Singapore Ministry of Education (MOE) Academic Research Fund (AcRF) Tier 1 grant
PY - 2022
Y1 - 2022
N2 - Learning heterogeneous graphs consisting of different types of nodes and edges enhances the results of homogeneous graph techniques. An interesting example of such graphs is control-flow graphs representing possible software code execution flows. As such graphs represent more semantic information of code, developing techniques and tools for such graphs can be highly beneficial for detecting vulnerabilities in software for its reliability. However, existing heterogeneous graph techniques are still insufficient in handling complex graphs where the number of different types of nodes and edges is large and variable. This paper concentrates on the Ethereum smart contracts as a sample of software codes represented by heterogeneous contract graphs built upon both control-flow graphs and call graphs containing different types of nodes and links. We propose MANDO, a new heterogeneous graph representation to learn such heterogeneous contract graphs' structures. MANDO extracts customized meta-paths, which compose relational connections between different types of nodes and their neighbors. Moreover, it develops a multi-metapath heterogeneous graph attention network to learn multi-level embeddings of different types of nodes and their metapaths in the heterogeneous contract graphs, which can capture the code semantics of smart contracts more accurately and facilitate both fine-grained line-level and coarse-grained contract-level vulnerability detection. Our extensive evaluation of large smart contract datasets shows that MANDO improves the vulnerability detection results of other techniques at the coarse-grained contract level. More importantly, it is the first learning-based approach capable of identifying vulnerabilities at the fine-grained line-level, and significantly improves the traditional code analysis-based vulnerability detection approaches by 11.35% to 70.81% in terms of F1-score.
AB - Learning heterogeneous graphs consisting of different types of nodes and edges enhances the results of homogeneous graph techniques. An interesting example of such graphs is control-flow graphs representing possible software code execution flows. As such graphs represent more semantic information of code, developing techniques and tools for such graphs can be highly beneficial for detecting vulnerabilities in software for its reliability. However, existing heterogeneous graph techniques are still insufficient in handling complex graphs where the number of different types of nodes and edges is large and variable. This paper concentrates on the Ethereum smart contracts as a sample of software codes represented by heterogeneous contract graphs built upon both control-flow graphs and call graphs containing different types of nodes and links. We propose MANDO, a new heterogeneous graph representation to learn such heterogeneous contract graphs' structures. MANDO extracts customized meta-paths, which compose relational connections between different types of nodes and their neighbors. Moreover, it develops a multi-metapath heterogeneous graph attention network to learn multi-level embeddings of different types of nodes and their metapaths in the heterogeneous contract graphs, which can capture the code semantics of smart contracts more accurately and facilitate both fine-grained line-level and coarse-grained contract-level vulnerability detection. Our extensive evaluation of large smart contract datasets shows that MANDO improves the vulnerability detection results of other techniques at the coarse-grained contract level. More importantly, it is the first learning-based approach capable of identifying vulnerabilities at the fine-grained line-level, and significantly improves the traditional code analysis-based vulnerability detection approaches by 11.35% to 70.81% in terms of F1-score.
KW - Ethereum blockchain
KW - graph embedding
KW - graph neural networks
KW - heterogeneous graphs
KW - smart contracts
KW - vulnerability detection
UR - http://www.scopus.com/inward/record.url?scp=85143075291&partnerID=8YFLogxK
U2 - 10.1109/DSAA54385.2022.10032337
DO - 10.1109/DSAA54385.2022.10032337
M3 - Conference contribution
AN - SCOPUS:85143075291
SN - 978-1-6654-7331-6
BT - 2022 IEEE 9th International Conference on Data Science and Advanced Analytics
A2 - Huang, Joshua Zhexue
A2 - Pan, Yi
A2 - Hammer, Barbara
A2 - Khan, Muhammad Khurram
A2 - Xie, Xing
A2 - Cui, Laizhong
A2 - He, Yulin
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 13 October 2022 through 16 October 2022
ER -