LG4AV: Combining Language Models and Graph Neural Networks for Author Verification

Maximilian Stubbemann; Gerd Stumme

doi:10.48550/arXiv.2109.01479

Details

Originalsprache	Englisch
Titel des Sammelwerks	Advances in Intelligent Data Analysis XX
Untertitel	20th International Symposium on Intelligent Data Analysis, IDA 2022, Rennes, France, April 20–22, 2022, Proceedings
Herausgeber/-innen	Tassadit Bouadi, Elisa Fromont, Eyke Hüllermeier
Erscheinungsort	Cham
Seiten	315-326
Seitenumfang	12
ISBN (elektronisch)	978-3-031-01333-1
Publikationsstatus	Veröffentlicht - 7 Apr. 2022
Veranstaltung	20th International Symposium on Intelligent Data Analysis, IDA 2022 - Rennes, Frankreich Dauer: 20 Apr. 2022 → 22 Apr. 2022

Publikationsreihe

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Band	13205 LNCS
ISSN (Print)	0302-9743
ISSN (elektronisch)	1611-3349

Abstract

The verification of document authorships is important in various settings. Researchers are for example judged and compared by the amount and impact of their publications and public figures are confronted by their posts on social media. Therefore, it is important that authorship information in frequently used data sets is correct. The question whether a given document is written by a given author is commonly referred to as authorship verification (AV). While AV is a widely investigated problem in general, only few works consider settings where the documents are short and written in a rather uniform style. This makes most approaches impractical for bibliometric data. Here, authorships of scientific publications have to be verified, often with just abstracts and titles available. To this point, we present LG4AV which combines language models and graph neural networks for authorship verification. By directly feeding the available texts in a pre-trained transformer architecture, our model does not need any hand-crafted stylometric features that are not meaningful in scenarios where the writing style is, at least to some extent, standardized. By the incorporation of a graph neural network structure, our model can benefit from relations between authors that are meaningful with respect to the verification process.

ASJC Scopus Sachgebiete

Mathematik (insg.)
Theoretische Informatik
Informatik (insg.)
Allgemeine Computerwissenschaft

Zitieren

LG4AV: Combining Language Models and Graph Neural Networks for Author Verification. / Stubbemann, Maximilian; Stumme, Gerd.
Advances in Intelligent Data Analysis XX: 20th International Symposium on Intelligent Data Analysis, IDA 2022, Rennes, France, April 20–22, 2022, Proceedings. Hrsg. / Tassadit Bouadi; Elisa Fromont; Eyke Hüllermeier. Cham, 2022. S. 315-326 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Band 13205 LNCS).

Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Aufsatz in Konferenzband › Forschung › Peer-Review

Stubbemann, M & Stumme, G 2022, LG4AV: Combining Language Models and Graph Neural Networks for Author Verification. in T Bouadi, E Fromont & E Hüllermeier (Hrsg.), Advances in Intelligent Data Analysis XX: 20th International Symposium on Intelligent Data Analysis, IDA 2022, Rennes, France, April 20–22, 2022, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Bd. 13205 LNCS, Cham, S. 315-326, 20th International Symposium on Intelligent Data Analysis, IDA 2022, Rennes, Frankreich, 20 Apr. 2022. https://doi.org/10.48550/arXiv.2109.01479, https://doi.org/10.1007/978-3-031-01333-1_25

Stubbemann, M., & Stumme, G. (2022). LG4AV: Combining Language Models and Graph Neural Networks for Author Verification. In T. Bouadi, E. Fromont, & E. Hüllermeier (Hrsg.), Advances in Intelligent Data Analysis XX: 20th International Symposium on Intelligent Data Analysis, IDA 2022, Rennes, France, April 20–22, 2022, Proceedings (S. 315-326). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Band 13205 LNCS).. https://doi.org/10.48550/arXiv.2109.01479, https://doi.org/10.1007/978-3-031-01333-1_25

Stubbemann M, Stumme G. LG4AV: Combining Language Models and Graph Neural Networks for Author Verification. in Bouadi T, Fromont E, Hüllermeier E, Hrsg., Advances in Intelligent Data Analysis XX: 20th International Symposium on Intelligent Data Analysis, IDA 2022, Rennes, France, April 20–22, 2022, Proceedings. Cham. 2022. S. 315-326. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.48550/arXiv.2109.01479, 10.1007/978-3-031-01333-1_25

Stubbemann, Maximilian ; Stumme, Gerd. / LG4AV : Combining Language Models and Graph Neural Networks for Author Verification. Advances in Intelligent Data Analysis XX: 20th International Symposium on Intelligent Data Analysis, IDA 2022, Rennes, France, April 20–22, 2022, Proceedings. Hrsg. / Tassadit Bouadi ; Elisa Fromont ; Eyke Hüllermeier. Cham, 2022. S. 315-326 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

Download

@inproceedings{8d3668a2a5b243988f008363291b6647,

title = "LG4AV: Combining Language Models and Graph Neural Networks for Author Verification",

abstract = "The verification of document authorships is important in various settings. Researchers are for example judged and compared by the amount and impact of their publications and public figures are confronted by their posts on social media. Therefore, it is important that authorship information in frequently used data sets is correct. The question whether a given document is written by a given author is commonly referred to as authorship verification (AV). While AV is a widely investigated problem in general, only few works consider settings where the documents are short and written in a rather uniform style. This makes most approaches impractical for bibliometric data. Here, authorships of scientific publications have to be verified, often with just abstracts and titles available. To this point, we present LG4AV which combines language models and graph neural networks for authorship verification. By directly feeding the available texts in a pre-trained transformer architecture, our model does not need any hand-crafted stylometric features that are not meaningful in scenarios where the writing style is, at least to some extent, standardized. By the incorporation of a graph neural network structure, our model can benefit from relations between authors that are meaningful with respect to the verification process.",

keywords = "Authorship verification, Co-authorships, Graph neural networks, Language models",

author = "Maximilian Stubbemann and Gerd Stumme",

note = "Funding Information: Acknowledgment. This work is partially funded by the German Federal Ministry of Education and Research (BMBF) in its program “Quantitative Wissenschafts-forschung” as part of the REGIO project under grant 01PU17012A. We thank Dominik D{\"u}rrschnabel and Lena Stubbemann for fruitful discussions and comments on the manuscript. ; 20th International Symposium on Intelligent Data Analysis, IDA 2022 ; Conference date: 20-04-2022 Through 22-04-2022",

year = "2022",

month = apr,

day = "7",

doi = "10.48550/arXiv.2109.01479",

language = "English",

isbn = "9783031013324",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

pages = "315--326",

editor = "Tassadit Bouadi and Elisa Fromont and Eyke H{\"u}llermeier",

booktitle = "Advances in Intelligent Data Analysis XX",

}

Download

TY - GEN

T1 - LG4AV

T2 - 20th International Symposium on Intelligent Data Analysis, IDA 2022

AU - Stubbemann, Maximilian

AU - Stumme, Gerd

N1 - Funding Information: Acknowledgment. This work is partially funded by the German Federal Ministry of Education and Research (BMBF) in its program “Quantitative Wissenschafts-forschung” as part of the REGIO project under grant 01PU17012A. We thank Dominik Dürrschnabel and Lena Stubbemann for fruitful discussions and comments on the manuscript.

PY - 2022/4/7

Y1 - 2022/4/7

N2 - The verification of document authorships is important in various settings. Researchers are for example judged and compared by the amount and impact of their publications and public figures are confronted by their posts on social media. Therefore, it is important that authorship information in frequently used data sets is correct. The question whether a given document is written by a given author is commonly referred to as authorship verification (AV). While AV is a widely investigated problem in general, only few works consider settings where the documents are short and written in a rather uniform style. This makes most approaches impractical for bibliometric data. Here, authorships of scientific publications have to be verified, often with just abstracts and titles available. To this point, we present LG4AV which combines language models and graph neural networks for authorship verification. By directly feeding the available texts in a pre-trained transformer architecture, our model does not need any hand-crafted stylometric features that are not meaningful in scenarios where the writing style is, at least to some extent, standardized. By the incorporation of a graph neural network structure, our model can benefit from relations between authors that are meaningful with respect to the verification process.

AB - The verification of document authorships is important in various settings. Researchers are for example judged and compared by the amount and impact of their publications and public figures are confronted by their posts on social media. Therefore, it is important that authorship information in frequently used data sets is correct. The question whether a given document is written by a given author is commonly referred to as authorship verification (AV). While AV is a widely investigated problem in general, only few works consider settings where the documents are short and written in a rather uniform style. This makes most approaches impractical for bibliometric data. Here, authorships of scientific publications have to be verified, often with just abstracts and titles available. To this point, we present LG4AV which combines language models and graph neural networks for authorship verification. By directly feeding the available texts in a pre-trained transformer architecture, our model does not need any hand-crafted stylometric features that are not meaningful in scenarios where the writing style is, at least to some extent, standardized. By the incorporation of a graph neural network structure, our model can benefit from relations between authors that are meaningful with respect to the verification process.

KW - Authorship verification

KW - Co-authorships

KW - Graph neural networks

KW - Language models

UR - http://www.scopus.com/inward/record.url?scp=85128708030&partnerID=8YFLogxK

U2 - 10.48550/arXiv.2109.01479

DO - 10.48550/arXiv.2109.01479

M3 - Conference contribution

AN - SCOPUS:85128708030

SN - 9783031013324

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 315

EP - 326

BT - Advances in Intelligent Data Analysis XX

A2 - Bouadi, Tassadit

A2 - Fromont, Elisa

A2 - Hüllermeier, Eyke

CY - Cham

Y2 - 20 April 2022 through 22 April 2022

ER -

Research@Leibniz University

LG4AV: Combining Language Models and Graph Neural Networks for Author Verification

Autorschaft

Organisationseinheiten

Externe Organisationen

Details

Publikationsreihe

Abstract

ASJC Scopus Sachgebiete

Zitieren