LG4AV: Combining Language Models and Graph Neural Networks for Author Verification

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

  • Maximilian Stubbemann
  • Gerd Stumme

Research Organisations

External Research Organisations

  • University of Kassel
View graph of relations

Details

Original languageEnglish
Title of host publicationAdvances in Intelligent Data Analysis XX
Subtitle of host publication20th International Symposium on Intelligent Data Analysis, IDA 2022, Rennes, France, April 20–22, 2022, Proceedings
EditorsTassadit Bouadi, Elisa Fromont, Eyke Hüllermeier
Place of PublicationCham
Pages315-326
Number of pages12
ISBN (electronic)978-3-031-01333-1
Publication statusPublished - 7 Apr 2022
Event20th International Symposium on Intelligent Data Analysis, IDA 2022 - Rennes, France
Duration: 20 Apr 202222 Apr 2022

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13205 LNCS
ISSN (Print)0302-9743
ISSN (electronic)1611-3349

Abstract

The verification of document authorships is important in various settings. Researchers are for example judged and compared by the amount and impact of their publications and public figures are confronted by their posts on social media. Therefore, it is important that authorship information in frequently used data sets is correct. The question whether a given document is written by a given author is commonly referred to as authorship verification (AV). While AV is a widely investigated problem in general, only few works consider settings where the documents are short and written in a rather uniform style. This makes most approaches impractical for bibliometric data. Here, authorships of scientific publications have to be verified, often with just abstracts and titles available. To this point, we present LG4AV which combines language models and graph neural networks for authorship verification. By directly feeding the available texts in a pre-trained transformer architecture, our model does not need any hand-crafted stylometric features that are not meaningful in scenarios where the writing style is, at least to some extent, standardized. By the incorporation of a graph neural network structure, our model can benefit from relations between authors that are meaningful with respect to the verification process.

Keywords

    Authorship verification, Co-authorships, Graph neural networks, Language models

ASJC Scopus subject areas

Cite this

LG4AV: Combining Language Models and Graph Neural Networks for Author Verification. / Stubbemann, Maximilian; Stumme, Gerd.
Advances in Intelligent Data Analysis XX: 20th International Symposium on Intelligent Data Analysis, IDA 2022, Rennes, France, April 20–22, 2022, Proceedings. ed. / Tassadit Bouadi; Elisa Fromont; Eyke Hüllermeier. Cham, 2022. p. 315-326 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13205 LNCS).

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Stubbemann, M & Stumme, G 2022, LG4AV: Combining Language Models and Graph Neural Networks for Author Verification. in T Bouadi, E Fromont & E Hüllermeier (eds), Advances in Intelligent Data Analysis XX: 20th International Symposium on Intelligent Data Analysis, IDA 2022, Rennes, France, April 20–22, 2022, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 13205 LNCS, Cham, pp. 315-326, 20th International Symposium on Intelligent Data Analysis, IDA 2022, Rennes, France, 20 Apr 2022. https://doi.org/10.48550/arXiv.2109.01479, https://doi.org/10.1007/978-3-031-01333-1_25
Stubbemann, M., & Stumme, G. (2022). LG4AV: Combining Language Models and Graph Neural Networks for Author Verification. In T. Bouadi, E. Fromont, & E. Hüllermeier (Eds.), Advances in Intelligent Data Analysis XX: 20th International Symposium on Intelligent Data Analysis, IDA 2022, Rennes, France, April 20–22, 2022, Proceedings (pp. 315-326). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13205 LNCS).. https://doi.org/10.48550/arXiv.2109.01479, https://doi.org/10.1007/978-3-031-01333-1_25
Stubbemann M, Stumme G. LG4AV: Combining Language Models and Graph Neural Networks for Author Verification. In Bouadi T, Fromont E, Hüllermeier E, editors, Advances in Intelligent Data Analysis XX: 20th International Symposium on Intelligent Data Analysis, IDA 2022, Rennes, France, April 20–22, 2022, Proceedings. Cham. 2022. p. 315-326. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.48550/arXiv.2109.01479, 10.1007/978-3-031-01333-1_25
Stubbemann, Maximilian ; Stumme, Gerd. / LG4AV : Combining Language Models and Graph Neural Networks for Author Verification. Advances in Intelligent Data Analysis XX: 20th International Symposium on Intelligent Data Analysis, IDA 2022, Rennes, France, April 20–22, 2022, Proceedings. editor / Tassadit Bouadi ; Elisa Fromont ; Eyke Hüllermeier. Cham, 2022. pp. 315-326 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Download
@inproceedings{8d3668a2a5b243988f008363291b6647,
title = "LG4AV: Combining Language Models and Graph Neural Networks for Author Verification",
abstract = "The verification of document authorships is important in various settings. Researchers are for example judged and compared by the amount and impact of their publications and public figures are confronted by their posts on social media. Therefore, it is important that authorship information in frequently used data sets is correct. The question whether a given document is written by a given author is commonly referred to as authorship verification (AV). While AV is a widely investigated problem in general, only few works consider settings where the documents are short and written in a rather uniform style. This makes most approaches impractical for bibliometric data. Here, authorships of scientific publications have to be verified, often with just abstracts and titles available. To this point, we present LG4AV which combines language models and graph neural networks for authorship verification. By directly feeding the available texts in a pre-trained transformer architecture, our model does not need any hand-crafted stylometric features that are not meaningful in scenarios where the writing style is, at least to some extent, standardized. By the incorporation of a graph neural network structure, our model can benefit from relations between authors that are meaningful with respect to the verification process.",
keywords = "Authorship verification, Co-authorships, Graph neural networks, Language models",
author = "Maximilian Stubbemann and Gerd Stumme",
note = "Funding Information: Acknowledgment. This work is partially funded by the German Federal Ministry of Education and Research (BMBF) in its program “Quantitative Wissenschafts-forschung” as part of the REGIO project under grant 01PU17012A. We thank Dominik D{\"u}rrschnabel and Lena Stubbemann for fruitful discussions and comments on the manuscript. ; 20th International Symposium on Intelligent Data Analysis, IDA 2022 ; Conference date: 20-04-2022 Through 22-04-2022",
year = "2022",
month = apr,
day = "7",
doi = "10.48550/arXiv.2109.01479",
language = "English",
isbn = "9783031013324",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "315--326",
editor = "Tassadit Bouadi and Elisa Fromont and Eyke H{\"u}llermeier",
booktitle = "Advances in Intelligent Data Analysis XX",

}

Download

TY - GEN

T1 - LG4AV

T2 - 20th International Symposium on Intelligent Data Analysis, IDA 2022

AU - Stubbemann, Maximilian

AU - Stumme, Gerd

N1 - Funding Information: Acknowledgment. This work is partially funded by the German Federal Ministry of Education and Research (BMBF) in its program “Quantitative Wissenschafts-forschung” as part of the REGIO project under grant 01PU17012A. We thank Dominik Dürrschnabel and Lena Stubbemann for fruitful discussions and comments on the manuscript.

PY - 2022/4/7

Y1 - 2022/4/7

N2 - The verification of document authorships is important in various settings. Researchers are for example judged and compared by the amount and impact of their publications and public figures are confronted by their posts on social media. Therefore, it is important that authorship information in frequently used data sets is correct. The question whether a given document is written by a given author is commonly referred to as authorship verification (AV). While AV is a widely investigated problem in general, only few works consider settings where the documents are short and written in a rather uniform style. This makes most approaches impractical for bibliometric data. Here, authorships of scientific publications have to be verified, often with just abstracts and titles available. To this point, we present LG4AV which combines language models and graph neural networks for authorship verification. By directly feeding the available texts in a pre-trained transformer architecture, our model does not need any hand-crafted stylometric features that are not meaningful in scenarios where the writing style is, at least to some extent, standardized. By the incorporation of a graph neural network structure, our model can benefit from relations between authors that are meaningful with respect to the verification process.

AB - The verification of document authorships is important in various settings. Researchers are for example judged and compared by the amount and impact of their publications and public figures are confronted by their posts on social media. Therefore, it is important that authorship information in frequently used data sets is correct. The question whether a given document is written by a given author is commonly referred to as authorship verification (AV). While AV is a widely investigated problem in general, only few works consider settings where the documents are short and written in a rather uniform style. This makes most approaches impractical for bibliometric data. Here, authorships of scientific publications have to be verified, often with just abstracts and titles available. To this point, we present LG4AV which combines language models and graph neural networks for authorship verification. By directly feeding the available texts in a pre-trained transformer architecture, our model does not need any hand-crafted stylometric features that are not meaningful in scenarios where the writing style is, at least to some extent, standardized. By the incorporation of a graph neural network structure, our model can benefit from relations between authors that are meaningful with respect to the verification process.

KW - Authorship verification

KW - Co-authorships

KW - Graph neural networks

KW - Language models

UR - http://www.scopus.com/inward/record.url?scp=85128708030&partnerID=8YFLogxK

U2 - 10.48550/arXiv.2109.01479

DO - 10.48550/arXiv.2109.01479

M3 - Conference contribution

AN - SCOPUS:85128708030

SN - 9783031013324

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 315

EP - 326

BT - Advances in Intelligent Data Analysis XX

A2 - Bouadi, Tassadit

A2 - Fromont, Elisa

A2 - Hüllermeier, Eyke

CY - Cham

Y2 - 20 April 2022 through 22 April 2022

ER -