Details
Original language | English |
---|---|
Title of host publication | 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE) |
Pages | 167-178 |
Number of pages | 12 |
ISBN (electronic) | 978-1-6654-0337-5 |
Publication status | Published - 2021 |
Publication series
Name | IEEE/ACM International Conference on Automated Software Engineering |
---|---|
ISSN (Print) | 1938-4300 |
ISSN (electronic) | 2643-1572 |
Abstract
Cross-language code retrieval is necessary in many real-world scenarios. A major application is program translation, e.g., porting codebases from an obsolete or deprecated language to a modern one or re-implementing existing projects in one's preferred programming language. Existing approaches based on the translation model require large amounts of training data and extra information or neglects significant characteristics of programs. Leveraging cross-language code retrieval to assist automatic program translation can make use of Big Code. However, existing code retrieval systems have the barrier to finding the translation with only the features of the input program as the query. In this paper, we present BigPT for interactive cross-language retrieval from Big Code only based on raw code and reusing the retrieved code to assist program translation. We build on existing work on cross-language code representation and propose a novel predictive transformation model based on auto-encoders. The model is trained on Big Code to generate a target-language representation, which will be used as the query to retrieve the most relevant translations for a given program. Our query representation enables the user to easily update and correct the returned results to improve the retrieval process. Our experiments show that BigPT outperforms state-of-the-art baselines in terms of program accuracy. Using our novel querying and retrieving mechanism, BigPT can be scaled to the large dataset and efficiently retrieve the translation.
ASJC Scopus subject areas
- Computer Science(all)
- Artificial Intelligence
- Computer Science(all)
- Software
- Engineering(all)
- Safety, Risk, Reliability and Quality
- Mathematics(all)
- Control and Optimization
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). 2021. p. 167-178 (IEEE/ACM International Conference on Automated Software Engineering).
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review
}
TY - GEN
T1 - Interactive Cross-language Code Retrieval with Auto-Encoders
AU - Chen, Binger
AU - Abedjan, Ziawasch
N1 - Funding Information: This work was funded by the German Ministry for Education and Research as BIFOLD - Berlin Institute for the Foundations of Learning and Data (ref. 01IS18025A and ref. 01IS18037A).
PY - 2021
Y1 - 2021
N2 - Cross-language code retrieval is necessary in many real-world scenarios. A major application is program translation, e.g., porting codebases from an obsolete or deprecated language to a modern one or re-implementing existing projects in one's preferred programming language. Existing approaches based on the translation model require large amounts of training data and extra information or neglects significant characteristics of programs. Leveraging cross-language code retrieval to assist automatic program translation can make use of Big Code. However, existing code retrieval systems have the barrier to finding the translation with only the features of the input program as the query. In this paper, we present BigPT for interactive cross-language retrieval from Big Code only based on raw code and reusing the retrieved code to assist program translation. We build on existing work on cross-language code representation and propose a novel predictive transformation model based on auto-encoders. The model is trained on Big Code to generate a target-language representation, which will be used as the query to retrieve the most relevant translations for a given program. Our query representation enables the user to easily update and correct the returned results to improve the retrieval process. Our experiments show that BigPT outperforms state-of-the-art baselines in terms of program accuracy. Using our novel querying and retrieving mechanism, BigPT can be scaled to the large dataset and efficiently retrieve the translation.
AB - Cross-language code retrieval is necessary in many real-world scenarios. A major application is program translation, e.g., porting codebases from an obsolete or deprecated language to a modern one or re-implementing existing projects in one's preferred programming language. Existing approaches based on the translation model require large amounts of training data and extra information or neglects significant characteristics of programs. Leveraging cross-language code retrieval to assist automatic program translation can make use of Big Code. However, existing code retrieval systems have the barrier to finding the translation with only the features of the input program as the query. In this paper, we present BigPT for interactive cross-language retrieval from Big Code only based on raw code and reusing the retrieved code to assist program translation. We build on existing work on cross-language code representation and propose a novel predictive transformation model based on auto-encoders. The model is trained on Big Code to generate a target-language representation, which will be used as the query to retrieve the most relevant translations for a given program. Our query representation enables the user to easily update and correct the returned results to improve the retrieval process. Our experiments show that BigPT outperforms state-of-the-art baselines in terms of program accuracy. Using our novel querying and retrieving mechanism, BigPT can be scaled to the large dataset and efficiently retrieve the translation.
UR - http://www.scopus.com/inward/record.url?scp=85125439140&partnerID=8YFLogxK
U2 - 10.1109/ASE51524.2021.9678929
DO - 10.1109/ASE51524.2021.9678929
M3 - Conference contribution
SN - 978-1-6654-4784-3
T3 - IEEE/ACM International Conference on Automated Software Engineering
SP - 167
EP - 178
BT - 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE)
ER -