Details
Original language | English |
---|---|
Title of host publication | 2021 IEEE/ACM 43rd International Conference on Software Engineering |
Subtitle of host publication | Companion Proceedings (ICSE-Companion) |
Pages | 252-253 |
Number of pages | 2 |
ISBN (electronic) | 9781665412193 |
Publication status | Published - 2021 |
Publication series
Name | IEEE/ACM International Conference on Software Engineering Companion proceedings |
---|---|
ISSN (electronic) | 2574-1926 |
Abstract
Program translation is a growing demand in software engineering. Manual program translation requires programming expertise in source and target language. One way to automate this process is to make use of the big data of programs, i.e., Big Code. However, existing code retrieval techniques lack the design to cover cross-language code retrieval. Other data-driven approaches require human efforts in constructing cross-language parallel datasets to train translation models. In this paper, we present RPT, a novel code translation retrieval system. We propose a lightweight but informative program representation, which can be generalized to all imperative PLs. Furthermore, we present our index structure and hierarchical filtering mechanism for efficient code retrieval from a Big Code database.
ASJC Scopus subject areas
- Computer Science(all)
- Software
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). 2021. p. 252-253 (IEEE/ACM International Conference on Software Engineering Companion proceedings).
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review
}
TY - GEN
T1 - RPT - Effective and Efficient Retrieval of Program Translations from Big Code
AU - Chen, Binger
AU - Abedjan, Ziawasch
N1 - Funding Information: Experiments. We apply our approach on a Java to C# parallel dataset used in previous work [1], [4], [5]. We compare the results of effectiveness and efficiency of RPT with state-of-the-art baselines 1pSMT, mppSMT, Tree2tree ,and TransCoder. Our metric is program accuracy [1]: the percentage of the retrieved translations that are functionality the same as the ground truth in the dataset. The results in Table I show that RPT is competitive to all baselines despite the fact that RPT is fully unsupervised and does not reuse existing data without training any models. Moreover, we observe that for the failed cases the translations tend to appear in the retrieved top 10 list. Further, the efficiency of our retrieval based system is shown to be comparable to other baselines. We also compare our index PBI to a simple path type index. PBI leads to a runtime-improvement by two orders of magnitude at a scale of 3.8GB database. Conclusion. We proposed RPT, a code-retrieval approach to support program translation with Big Code, which is competitive with existing translation methods. In future work, will augment the retrieval system with program generation to overcome the limitations of the database. Data Availability. We published our code on https://github. com/BigDaMa/RPT. Acknowledgements. This work was funded by the German Ministry for Education and Research as BIFOLD - Berlin Institute for the Foundations of Learning and Data (ref. 01IS18025A and ref. 01IS18037A).
PY - 2021
Y1 - 2021
N2 - Program translation is a growing demand in software engineering. Manual program translation requires programming expertise in source and target language. One way to automate this process is to make use of the big data of programs, i.e., Big Code. However, existing code retrieval techniques lack the design to cover cross-language code retrieval. Other data-driven approaches require human efforts in constructing cross-language parallel datasets to train translation models. In this paper, we present RPT, a novel code translation retrieval system. We propose a lightweight but informative program representation, which can be generalized to all imperative PLs. Furthermore, we present our index structure and hierarchical filtering mechanism for efficient code retrieval from a Big Code database.
AB - Program translation is a growing demand in software engineering. Manual program translation requires programming expertise in source and target language. One way to automate this process is to make use of the big data of programs, i.e., Big Code. However, existing code retrieval techniques lack the design to cover cross-language code retrieval. Other data-driven approaches require human efforts in constructing cross-language parallel datasets to train translation models. In this paper, we present RPT, a novel code translation retrieval system. We propose a lightweight but informative program representation, which can be generalized to all imperative PLs. Furthermore, we present our index structure and hierarchical filtering mechanism for efficient code retrieval from a Big Code database.
UR - http://www.scopus.com/inward/record.url?scp=85115722392&partnerID=8YFLogxK
U2 - 10.1109/ICSE-COMPANION52605.2021.00117
DO - 10.1109/ICSE-COMPANION52605.2021.00117
M3 - Conference contribution
SN - 978-1-6654-1219-3
T3 - IEEE/ACM International Conference on Software Engineering Companion proceedings
SP - 252
EP - 253
BT - 2021 IEEE/ACM 43rd International Conference on Software Engineering
ER -