RPT - Effective and Efficient Retrieval of Program Translations from Big Code

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

  • Binger Chen
  • Ziawasch Abedjan

External Research Organisations

  • Technische Universität Berlin
View graph of relations

Details

Original languageEnglish
Title of host publication2021 IEEE/ACM 43rd International Conference on Software Engineering
Subtitle of host publicationCompanion Proceedings (ICSE-Companion)
Pages252-253
Number of pages2
ISBN (electronic)9781665412193
Publication statusPublished - 2021

Publication series

NameIEEE/ACM International Conference on Software Engineering Companion proceedings
ISSN (electronic)2574-1926

Abstract

Program translation is a growing demand in software engineering. Manual program translation requires programming expertise in source and target language. One way to automate this process is to make use of the big data of programs, i.e., Big Code. However, existing code retrieval techniques lack the design to cover cross-language code retrieval. Other data-driven approaches require human efforts in constructing cross-language parallel datasets to train translation models. In this paper, we present RPT, a novel code translation retrieval system. We propose a lightweight but informative program representation, which can be generalized to all imperative PLs. Furthermore, we present our index structure and hierarchical filtering mechanism for efficient code retrieval from a Big Code database.

ASJC Scopus subject areas

Cite this

RPT - Effective and Efficient Retrieval of Program Translations from Big Code. / Chen, Binger; Abedjan, Ziawasch.
2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). 2021. p. 252-253 (IEEE/ACM International Conference on Software Engineering Companion proceedings).

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Chen, B & Abedjan, Z 2021, RPT - Effective and Efficient Retrieval of Program Translations from Big Code. in 2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). IEEE/ACM International Conference on Software Engineering Companion proceedings, pp. 252-253. https://doi.org/10.1109/ICSE-COMPANION52605.2021.00117
Chen, B., & Abedjan, Z. (2021). RPT - Effective and Efficient Retrieval of Program Translations from Big Code. In 2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion) (pp. 252-253). (IEEE/ACM International Conference on Software Engineering Companion proceedings). https://doi.org/10.1109/ICSE-COMPANION52605.2021.00117
Chen B, Abedjan Z. RPT - Effective and Efficient Retrieval of Program Translations from Big Code. In 2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). 2021. p. 252-253. (IEEE/ACM International Conference on Software Engineering Companion proceedings). doi: 10.1109/ICSE-COMPANION52605.2021.00117
Chen, Binger ; Abedjan, Ziawasch. / RPT - Effective and Efficient Retrieval of Program Translations from Big Code. 2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). 2021. pp. 252-253 (IEEE/ACM International Conference on Software Engineering Companion proceedings).
Download
@inproceedings{43e787eb64454ee98e3239a19cdd6ed7,
title = "RPT - Effective and Efficient Retrieval of Program Translations from Big Code",
abstract = "Program translation is a growing demand in software engineering. Manual program translation requires programming expertise in source and target language. One way to automate this process is to make use of the big data of programs, i.e., Big Code. However, existing code retrieval techniques lack the design to cover cross-language code retrieval. Other data-driven approaches require human efforts in constructing cross-language parallel datasets to train translation models. In this paper, we present RPT, a novel code translation retrieval system. We propose a lightweight but informative program representation, which can be generalized to all imperative PLs. Furthermore, we present our index structure and hierarchical filtering mechanism for efficient code retrieval from a Big Code database.",
author = "Binger Chen and Ziawasch Abedjan",
note = "Funding Information: Experiments. We apply our approach on a Java to C# parallel dataset used in previous work [1], [4], [5]. We compare the results of effectiveness and efficiency of RPT with state-of-the-art baselines 1pSMT, mppSMT, Tree2tree ,and TransCoder. Our metric is program accuracy [1]: the percentage of the retrieved translations that are functionality the same as the ground truth in the dataset. The results in Table I show that RPT is competitive to all baselines despite the fact that RPT is fully unsupervised and does not reuse existing data without training any models. Moreover, we observe that for the failed cases the translations tend to appear in the retrieved top 10 list. Further, the efficiency of our retrieval based system is shown to be comparable to other baselines. We also compare our index PBI to a simple path type index. PBI leads to a runtime-improvement by two orders of magnitude at a scale of 3.8GB database. Conclusion. We proposed RPT, a code-retrieval approach to support program translation with Big Code, which is competitive with existing translation methods. In future work, will augment the retrieval system with program generation to overcome the limitations of the database. Data Availability. We published our code on https://github. com/BigDaMa/RPT. Acknowledgements. This work was funded by the German Ministry for Education and Research as BIFOLD - Berlin Institute for the Foundations of Learning and Data (ref. 01IS18025A and ref. 01IS18037A).",
year = "2021",
doi = "10.1109/ICSE-COMPANION52605.2021.00117",
language = "English",
isbn = "978-1-6654-1219-3",
series = "IEEE/ACM International Conference on Software Engineering Companion proceedings",
pages = "252--253",
booktitle = "2021 IEEE/ACM 43rd International Conference on Software Engineering",

}

Download

TY - GEN

T1 - RPT - Effective and Efficient Retrieval of Program Translations from Big Code

AU - Chen, Binger

AU - Abedjan, Ziawasch

N1 - Funding Information: Experiments. We apply our approach on a Java to C# parallel dataset used in previous work [1], [4], [5]. We compare the results of effectiveness and efficiency of RPT with state-of-the-art baselines 1pSMT, mppSMT, Tree2tree ,and TransCoder. Our metric is program accuracy [1]: the percentage of the retrieved translations that are functionality the same as the ground truth in the dataset. The results in Table I show that RPT is competitive to all baselines despite the fact that RPT is fully unsupervised and does not reuse existing data without training any models. Moreover, we observe that for the failed cases the translations tend to appear in the retrieved top 10 list. Further, the efficiency of our retrieval based system is shown to be comparable to other baselines. We also compare our index PBI to a simple path type index. PBI leads to a runtime-improvement by two orders of magnitude at a scale of 3.8GB database. Conclusion. We proposed RPT, a code-retrieval approach to support program translation with Big Code, which is competitive with existing translation methods. In future work, will augment the retrieval system with program generation to overcome the limitations of the database. Data Availability. We published our code on https://github. com/BigDaMa/RPT. Acknowledgements. This work was funded by the German Ministry for Education and Research as BIFOLD - Berlin Institute for the Foundations of Learning and Data (ref. 01IS18025A and ref. 01IS18037A).

PY - 2021

Y1 - 2021

N2 - Program translation is a growing demand in software engineering. Manual program translation requires programming expertise in source and target language. One way to automate this process is to make use of the big data of programs, i.e., Big Code. However, existing code retrieval techniques lack the design to cover cross-language code retrieval. Other data-driven approaches require human efforts in constructing cross-language parallel datasets to train translation models. In this paper, we present RPT, a novel code translation retrieval system. We propose a lightweight but informative program representation, which can be generalized to all imperative PLs. Furthermore, we present our index structure and hierarchical filtering mechanism for efficient code retrieval from a Big Code database.

AB - Program translation is a growing demand in software engineering. Manual program translation requires programming expertise in source and target language. One way to automate this process is to make use of the big data of programs, i.e., Big Code. However, existing code retrieval techniques lack the design to cover cross-language code retrieval. Other data-driven approaches require human efforts in constructing cross-language parallel datasets to train translation models. In this paper, we present RPT, a novel code translation retrieval system. We propose a lightweight but informative program representation, which can be generalized to all imperative PLs. Furthermore, we present our index structure and hierarchical filtering mechanism for efficient code retrieval from a Big Code database.

UR - http://www.scopus.com/inward/record.url?scp=85115722392&partnerID=8YFLogxK

U2 - 10.1109/ICSE-COMPANION52605.2021.00117

DO - 10.1109/ICSE-COMPANION52605.2021.00117

M3 - Conference contribution

SN - 978-1-6654-1219-3

T3 - IEEE/ACM International Conference on Software Engineering Companion proceedings

SP - 252

EP - 253

BT - 2021 IEEE/ACM 43rd International Conference on Software Engineering

ER -