DuetCS: Code Style Transfer through Generation and Retrieval

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

  • Binger Chen
  • Ziawasch Abedjan

Research Organisations

External Research Organisations

  • Technische Universität Berlin
View graph of relations

Details

Original languageEnglish
Title of host publication2023 IEEE/ACM 45th International Conference on Software Engineering
Subtitle of host publicationICSE 2023
PublisherIEEE Computer Society
Pages2362-2373
Number of pages12
ISBN (electronic)9781665457019
ISBN (print)978-1-6654-5702-6
Publication statusPublished - 2023
Event45th IEEE/ACM International Conference on Software Engineering - Melbourne, Australia
Duration: 14 May 202320 May 2023

Publication series

NameProceedings - International Conference on Software Engineering
ISSN (Print)0270-5257

Abstract

Coding style has direct impact on code comprehension. Automatically transferring code style to user's preference or consistency can facilitate project cooperation and maintenance, as well as maximize the value of open-source code. Existing work on automating code stylization is either limited to code formatting or requires human supervision in pre-defining style checking and transformation rules. In this paper, we present unsupervised methods to assist automatic code style transfer for arbitrary code styles. The main idea is to leverage Big Code database to learn style and content embedding separately to generate or retrieve a piece of code with the same functionality and the desired target style. We carefully encode style and content features, so that a style embedding can be learned from arbitrary code. We explored the capabilities of novel attention-based style generation models and meta-learning and implemented our ideas in DUETCS. We complement the learning-based approach with a retrieval mode, which uses the same embeddings to directly search for the desired piece of code in Big Code. Our experiments show that DUETCS captures more style aspects than existing baselines.

ASJC Scopus subject areas

Cite this

DuetCS: Code Style Transfer through Generation and Retrieval. / Chen, Binger; Abedjan, Ziawasch.
2023 IEEE/ACM 45th International Conference on Software Engineering: ICSE 2023. IEEE Computer Society, 2023. p. 2362-2373 (Proceedings - International Conference on Software Engineering).

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Chen, B & Abedjan, Z 2023, DuetCS: Code Style Transfer through Generation and Retrieval. in 2023 IEEE/ACM 45th International Conference on Software Engineering: ICSE 2023. Proceedings - International Conference on Software Engineering, IEEE Computer Society, pp. 2362-2373, 45th IEEE/ACM International Conference on Software Engineering, Melbourne, Australia, 14 May 2023. https://doi.org/10.1109/ICSE48619.2023.00198
Chen, B., & Abedjan, Z. (2023). DuetCS: Code Style Transfer through Generation and Retrieval. In 2023 IEEE/ACM 45th International Conference on Software Engineering: ICSE 2023 (pp. 2362-2373). (Proceedings - International Conference on Software Engineering). IEEE Computer Society. https://doi.org/10.1109/ICSE48619.2023.00198
Chen B, Abedjan Z. DuetCS: Code Style Transfer through Generation and Retrieval. In 2023 IEEE/ACM 45th International Conference on Software Engineering: ICSE 2023. IEEE Computer Society. 2023. p. 2362-2373. (Proceedings - International Conference on Software Engineering). doi: 10.1109/ICSE48619.2023.00198
Chen, Binger ; Abedjan, Ziawasch. / DuetCS : Code Style Transfer through Generation and Retrieval. 2023 IEEE/ACM 45th International Conference on Software Engineering: ICSE 2023. IEEE Computer Society, 2023. pp. 2362-2373 (Proceedings - International Conference on Software Engineering).
Download
@inproceedings{6c8789cd92504d788df063d0c3a240db,
title = "DuetCS: Code Style Transfer through Generation and Retrieval",
abstract = "Coding style has direct impact on code comprehension. Automatically transferring code style to user's preference or consistency can facilitate project cooperation and maintenance, as well as maximize the value of open-source code. Existing work on automating code stylization is either limited to code formatting or requires human supervision in pre-defining style checking and transformation rules. In this paper, we present unsupervised methods to assist automatic code style transfer for arbitrary code styles. The main idea is to leverage Big Code database to learn style and content embedding separately to generate or retrieve a piece of code with the same functionality and the desired target style. We carefully encode style and content features, so that a style embedding can be learned from arbitrary code. We explored the capabilities of novel attention-based style generation models and meta-learning and implemented our ideas in DUETCS. We complement the learning-based approach with a retrieval mode, which uses the same embeddings to directly search for the desired piece of code in Big Code. Our experiments show that DUETCS captures more style aspects than existing baselines.",
author = "Binger Chen and Ziawasch Abedjan",
note = "Funding Information: This work was funded by the German Ministry for Education and Research as BIFOLD - Berlin Institute for the Foundations of Learning and Data (ref. 01IS18025A and ref. 01IS18037A). ; 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023 ; Conference date: 14-05-2023 Through 20-05-2023",
year = "2023",
doi = "10.1109/ICSE48619.2023.00198",
language = "English",
isbn = "978-1-6654-5702-6",
series = "Proceedings - International Conference on Software Engineering",
publisher = "IEEE Computer Society",
pages = "2362--2373",
booktitle = "2023 IEEE/ACM 45th International Conference on Software Engineering",
address = "United States",

}

Download

TY - GEN

T1 - DuetCS

T2 - 45th IEEE/ACM International Conference on Software Engineering

AU - Chen, Binger

AU - Abedjan, Ziawasch

N1 - Funding Information: This work was funded by the German Ministry for Education and Research as BIFOLD - Berlin Institute for the Foundations of Learning and Data (ref. 01IS18025A and ref. 01IS18037A).

PY - 2023

Y1 - 2023

N2 - Coding style has direct impact on code comprehension. Automatically transferring code style to user's preference or consistency can facilitate project cooperation and maintenance, as well as maximize the value of open-source code. Existing work on automating code stylization is either limited to code formatting or requires human supervision in pre-defining style checking and transformation rules. In this paper, we present unsupervised methods to assist automatic code style transfer for arbitrary code styles. The main idea is to leverage Big Code database to learn style and content embedding separately to generate or retrieve a piece of code with the same functionality and the desired target style. We carefully encode style and content features, so that a style embedding can be learned from arbitrary code. We explored the capabilities of novel attention-based style generation models and meta-learning and implemented our ideas in DUETCS. We complement the learning-based approach with a retrieval mode, which uses the same embeddings to directly search for the desired piece of code in Big Code. Our experiments show that DUETCS captures more style aspects than existing baselines.

AB - Coding style has direct impact on code comprehension. Automatically transferring code style to user's preference or consistency can facilitate project cooperation and maintenance, as well as maximize the value of open-source code. Existing work on automating code stylization is either limited to code formatting or requires human supervision in pre-defining style checking and transformation rules. In this paper, we present unsupervised methods to assist automatic code style transfer for arbitrary code styles. The main idea is to leverage Big Code database to learn style and content embedding separately to generate or retrieve a piece of code with the same functionality and the desired target style. We carefully encode style and content features, so that a style embedding can be learned from arbitrary code. We explored the capabilities of novel attention-based style generation models and meta-learning and implemented our ideas in DUETCS. We complement the learning-based approach with a retrieval mode, which uses the same embeddings to directly search for the desired piece of code in Big Code. Our experiments show that DUETCS captures more style aspects than existing baselines.

UR - http://www.scopus.com/inward/record.url?scp=85171738729&partnerID=8YFLogxK

U2 - 10.1109/ICSE48619.2023.00198

DO - 10.1109/ICSE48619.2023.00198

M3 - Conference contribution

AN - SCOPUS:85171738729

SN - 978-1-6654-5702-6

T3 - Proceedings - International Conference on Software Engineering

SP - 2362

EP - 2373

BT - 2023 IEEE/ACM 45th International Conference on Software Engineering

PB - IEEE Computer Society

Y2 - 14 May 2023 through 20 May 2023

ER -