Details
Original language | English |
---|---|
Title of host publication | ECAI 2024 - 27th European Conference on Artificial Intelligence, Including 13th Conference on Prestigious Applications of Intelligent Systems, PAIS 2024 |
Editors | Ulle Endriss, Francisco S. Melo, Kerstin Bach, Alberto Bugarin-Diz, Jose M. Alonso-Moral, Senen Barro, Fredrik Heintz |
Pages | 4602-4609 |
Number of pages | 8 |
ISBN (electronic) | 9781643685489 |
Publication status | Published - 19 Oct 2024 |
Event | 27th European Conference on Artificial Intelligence, ECAI 2024 - Santiago de Compostela, Spain Duration: 19 Oct 2024 → 24 Oct 2024 |
Publication series
Name | Frontiers in Artificial Intelligence and Applications |
---|---|
Volume | 392 |
ISSN (Print) | 0922-6389 |
ISSN (electronic) | 1879-8314 |
Abstract
Entity matching, also known as user identity linkage, is a critical task in data integration. While established techniques primarily focus on large-scale networks, there are several applications where small networks pose challenges due to limited training data and sparsity. This study addresses entity matching in the field of criminology, where small networks are common and the number of known matching nodes is restricted. To support this research, we exploit a multimodal dataset, collected as part of a security-related project, consisting of an intercepted telephone calls network (i.e., ROXSD data) and a network of social forum interactions (i.e., ROXHOOD data) collected in a simulated environment, although following real investigation scenario. To improve accuracy and efficiency, we propose a novel approach for entity matching across these two small networks using node attributes. Existing techniques often merely focus on topology consistency between two networks and overlook valuable information, such as network node attributes, making them vulnerable to structural changes. Inspired by the remarkable success of deep learning, we present UGC-DeepLink, an end-to-end semi-supervised learning framework that leverages user-generated content. UGC-DeepLink encodes network nodes into vector representations, capturing both local and global network structures to align anchor nodes using deep neural networks. A dual learning paradigm and the policy gradient method transfer knowledge and update the linkage. Additionally, node attributes, such as call contents and forum exchanged texts, enhance the ranking of matching nodes. Experimental results on ROXSD and ROXHOOD demonstrate that UGC-DeepLink surpasses baselines and state-of-the-art methods in terms of identity-match ranking. The code and dataset are available at https://github.com/erichoang/UGC-DeepLink.
ASJC Scopus subject areas
- Computer Science(all)
- Artificial Intelligence
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
ECAI 2024 - 27th European Conference on Artificial Intelligence, Including 13th Conference on Prestigious Applications of Intelligent Systems, PAIS 2024. ed. / Ulle Endriss; Francisco S. Melo; Kerstin Bach; Alberto Bugarin-Diz; Jose M. Alonso-Moral; Senen Barro; Fredrik Heintz. 2024. p. 4602-4609 (Frontiers in Artificial Intelligence and Applications; Vol. 392).
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review
}
TY - GEN
T1 - Entity Matching Across Small Networks Using Node Attributes
AU - Ahmadi, Zahra
AU - Zhang, Zijian
AU - Nguyen, Hoang H.
AU - Burdisso, Sergio
AU - Madikeri, Srikanth
AU - Motlicek, Petr
AU - Dikici, Erinc
AU - Backfried, Gerhard
AU - Kovac, Marek
AU - Maly, Květoslav
AU - Kudenko, Daniel
N1 - Publisher Copyright: © 2024 The Authors.
PY - 2024/10/19
Y1 - 2024/10/19
N2 - Entity matching, also known as user identity linkage, is a critical task in data integration. While established techniques primarily focus on large-scale networks, there are several applications where small networks pose challenges due to limited training data and sparsity. This study addresses entity matching in the field of criminology, where small networks are common and the number of known matching nodes is restricted. To support this research, we exploit a multimodal dataset, collected as part of a security-related project, consisting of an intercepted telephone calls network (i.e., ROXSD data) and a network of social forum interactions (i.e., ROXHOOD data) collected in a simulated environment, although following real investigation scenario. To improve accuracy and efficiency, we propose a novel approach for entity matching across these two small networks using node attributes. Existing techniques often merely focus on topology consistency between two networks and overlook valuable information, such as network node attributes, making them vulnerable to structural changes. Inspired by the remarkable success of deep learning, we present UGC-DeepLink, an end-to-end semi-supervised learning framework that leverages user-generated content. UGC-DeepLink encodes network nodes into vector representations, capturing both local and global network structures to align anchor nodes using deep neural networks. A dual learning paradigm and the policy gradient method transfer knowledge and update the linkage. Additionally, node attributes, such as call contents and forum exchanged texts, enhance the ranking of matching nodes. Experimental results on ROXSD and ROXHOOD demonstrate that UGC-DeepLink surpasses baselines and state-of-the-art methods in terms of identity-match ranking. The code and dataset are available at https://github.com/erichoang/UGC-DeepLink.
AB - Entity matching, also known as user identity linkage, is a critical task in data integration. While established techniques primarily focus on large-scale networks, there are several applications where small networks pose challenges due to limited training data and sparsity. This study addresses entity matching in the field of criminology, where small networks are common and the number of known matching nodes is restricted. To support this research, we exploit a multimodal dataset, collected as part of a security-related project, consisting of an intercepted telephone calls network (i.e., ROXSD data) and a network of social forum interactions (i.e., ROXHOOD data) collected in a simulated environment, although following real investigation scenario. To improve accuracy and efficiency, we propose a novel approach for entity matching across these two small networks using node attributes. Existing techniques often merely focus on topology consistency between two networks and overlook valuable information, such as network node attributes, making them vulnerable to structural changes. Inspired by the remarkable success of deep learning, we present UGC-DeepLink, an end-to-end semi-supervised learning framework that leverages user-generated content. UGC-DeepLink encodes network nodes into vector representations, capturing both local and global network structures to align anchor nodes using deep neural networks. A dual learning paradigm and the policy gradient method transfer knowledge and update the linkage. Additionally, node attributes, such as call contents and forum exchanged texts, enhance the ranking of matching nodes. Experimental results on ROXSD and ROXHOOD demonstrate that UGC-DeepLink surpasses baselines and state-of-the-art methods in terms of identity-match ranking. The code and dataset are available at https://github.com/erichoang/UGC-DeepLink.
UR - http://www.scopus.com/inward/record.url?scp=85216652739&partnerID=8YFLogxK
U2 - 10.3233/FAIA241054
DO - 10.3233/FAIA241054
M3 - Conference contribution
AN - SCOPUS:85216652739
T3 - Frontiers in Artificial Intelligence and Applications
SP - 4602
EP - 4609
BT - ECAI 2024 - 27th European Conference on Artificial Intelligence, Including 13th Conference on Prestigious Applications of Intelligent Systems, PAIS 2024
A2 - Endriss, Ulle
A2 - Melo, Francisco S.
A2 - Bach, Kerstin
A2 - Bugarin-Diz, Alberto
A2 - Alonso-Moral, Jose M.
A2 - Barro, Senen
A2 - Heintz, Fredrik
T2 - 27th European Conference on Artificial Intelligence, ECAI 2024
Y2 - 19 October 2024 through 24 October 2024
ER -