Interactive Cross-language Code Retrieval with Auto-Encoders

Binger Chen; Ziawasch Abedjan

doi:10.1109/ASE51524.2021.9678929

Details

Original language	English
Title of host publication	2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE)
Pages	167-178
Number of pages	12
ISBN (electronic)	978-1-6654-0337-5
Publication status	Published - 2021

Publication series

Name	IEEE/ACM International Conference on Automated Software Engineering
ISSN (Print)	1938-4300
ISSN (electronic)	2643-1572

Abstract

Cross-language code retrieval is necessary in many real-world scenarios. A major application is program translation, e.g., porting codebases from an obsolete or deprecated language to a modern one or re-implementing existing projects in one's preferred programming language. Existing approaches based on the translation model require large amounts of training data and extra information or neglects significant characteristics of programs. Leveraging cross-language code retrieval to assist automatic program translation can make use of Big Code. However, existing code retrieval systems have the barrier to finding the translation with only the features of the input program as the query. In this paper, we present BigPT for interactive cross-language retrieval from Big Code only based on raw code and reusing the retrieved code to assist program translation. We build on existing work on cross-language code representation and propose a novel predictive transformation model based on auto-encoders. The model is trained on Big Code to generate a target-language representation, which will be used as the query to retrieve the most relevant translations for a given program. Our query representation enables the user to easily update and correct the returned results to improve the retrieval process. Our experiments show that BigPT outperforms state-of-the-art baselines in terms of program accuracy. Using our novel querying and retrieving mechanism, BigPT can be scaled to the large dataset and efficiently retrieve the translation.

ASJC Scopus subject areas

Computer Science(all)
Artificial Intelligence
Computer Science(all)
Software
Engineering(all)
Safety, Risk, Reliability and Quality
Mathematics(all)
Control and Optimization

Cite this

Interactive Cross-language Code Retrieval with Auto-Encoders. / Chen, Binger; Abedjan, Ziawasch.
2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). 2021. p. 167-178 (IEEE/ACM International Conference on Automated Software Engineering).

Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review

Chen, B & Abedjan, Z 2021, Interactive Cross-language Code Retrieval with Auto-Encoders. in 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE/ACM International Conference on Automated Software Engineering, pp. 167-178. https://doi.org/10.1109/ASE51524.2021.9678929

Chen, B., & Abedjan, Z. (2021). Interactive Cross-language Code Retrieval with Auto-Encoders. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE) (pp. 167-178). (IEEE/ACM International Conference on Automated Software Engineering). https://doi.org/10.1109/ASE51524.2021.9678929

Chen B, Abedjan Z. Interactive Cross-language Code Retrieval with Auto-Encoders. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). 2021. p. 167-178. (IEEE/ACM International Conference on Automated Software Engineering). doi: 10.1109/ASE51524.2021.9678929

Chen, Binger ; Abedjan, Ziawasch. / Interactive Cross-language Code Retrieval with Auto-Encoders. 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). 2021. pp. 167-178 (IEEE/ACM International Conference on Automated Software Engineering).

Download

@inproceedings{cd77f53420ae49539d33f1563c19607d,

title = "Interactive Cross-language Code Retrieval with Auto-Encoders",

abstract = "Cross-language code retrieval is necessary in many real-world scenarios. A major application is program translation, e.g., porting codebases from an obsolete or deprecated language to a modern one or re-implementing existing projects in one's preferred programming language. Existing approaches based on the translation model require large amounts of training data and extra information or neglects significant characteristics of programs. Leveraging cross-language code retrieval to assist automatic program translation can make use of Big Code. However, existing code retrieval systems have the barrier to finding the translation with only the features of the input program as the query. In this paper, we present BigPT for interactive cross-language retrieval from Big Code only based on raw code and reusing the retrieved code to assist program translation. We build on existing work on cross-language code representation and propose a novel predictive transformation model based on auto-encoders. The model is trained on Big Code to generate a target-language representation, which will be used as the query to retrieve the most relevant translations for a given program. Our query representation enables the user to easily update and correct the returned results to improve the retrieval process. Our experiments show that BigPT outperforms state-of-the-art baselines in terms of program accuracy. Using our novel querying and retrieving mechanism, BigPT can be scaled to the large dataset and efficiently retrieve the translation.",

author = "Binger Chen and Ziawasch Abedjan",

note = "Funding Information: This work was funded by the German Ministry for Education and Research as BIFOLD - Berlin Institute for the Foundations of Learning and Data (ref. 01IS18025A and ref. 01IS18037A).",

year = "2021",

doi = "10.1109/ASE51524.2021.9678929",

language = "English",

isbn = "978-1-6654-4784-3",

series = "IEEE/ACM International Conference on Automated Software Engineering",

pages = "167--178",

booktitle = "2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE)",

}

Download

TY - GEN

T1 - Interactive Cross-language Code Retrieval with Auto-Encoders

AU - Chen, Binger

AU - Abedjan, Ziawasch

N1 - Funding Information: This work was funded by the German Ministry for Education and Research as BIFOLD - Berlin Institute for the Foundations of Learning and Data (ref. 01IS18025A and ref. 01IS18037A).

PY - 2021

Y1 - 2021

N2 - Cross-language code retrieval is necessary in many real-world scenarios. A major application is program translation, e.g., porting codebases from an obsolete or deprecated language to a modern one or re-implementing existing projects in one's preferred programming language. Existing approaches based on the translation model require large amounts of training data and extra information or neglects significant characteristics of programs. Leveraging cross-language code retrieval to assist automatic program translation can make use of Big Code. However, existing code retrieval systems have the barrier to finding the translation with only the features of the input program as the query. In this paper, we present BigPT for interactive cross-language retrieval from Big Code only based on raw code and reusing the retrieved code to assist program translation. We build on existing work on cross-language code representation and propose a novel predictive transformation model based on auto-encoders. The model is trained on Big Code to generate a target-language representation, which will be used as the query to retrieve the most relevant translations for a given program. Our query representation enables the user to easily update and correct the returned results to improve the retrieval process. Our experiments show that BigPT outperforms state-of-the-art baselines in terms of program accuracy. Using our novel querying and retrieving mechanism, BigPT can be scaled to the large dataset and efficiently retrieve the translation.

AB - Cross-language code retrieval is necessary in many real-world scenarios. A major application is program translation, e.g., porting codebases from an obsolete or deprecated language to a modern one or re-implementing existing projects in one's preferred programming language. Existing approaches based on the translation model require large amounts of training data and extra information or neglects significant characteristics of programs. Leveraging cross-language code retrieval to assist automatic program translation can make use of Big Code. However, existing code retrieval systems have the barrier to finding the translation with only the features of the input program as the query. In this paper, we present BigPT for interactive cross-language retrieval from Big Code only based on raw code and reusing the retrieved code to assist program translation. We build on existing work on cross-language code representation and propose a novel predictive transformation model based on auto-encoders. The model is trained on Big Code to generate a target-language representation, which will be used as the query to retrieve the most relevant translations for a given program. Our query representation enables the user to easily update and correct the returned results to improve the retrieval process. Our experiments show that BigPT outperforms state-of-the-art baselines in terms of program accuracy. Using our novel querying and retrieving mechanism, BigPT can be scaled to the large dataset and efficiently retrieve the translation.

UR - http://www.scopus.com/inward/record.url?scp=85125439140&partnerID=8YFLogxK

U2 - 10.1109/ASE51524.2021.9678929

DO - 10.1109/ASE51524.2021.9678929

M3 - Conference contribution

SN - 978-1-6654-4784-3

T3 - IEEE/ACM International Conference on Automated Software Engineering

SP - 167

EP - 178

BT - 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE)

ER -

Research@Leibniz University

Interactive Cross-language Code Retrieval with Auto-Encoders

Authors

Research Organisations

External Research Organisations

Details

Publication series

Abstract

ASJC Scopus subject areas

Cite this