Recent Advances in Programming Language Pre-Trained Models (PL-PTMs)

Maintained by WANG Yue (wangyue2714@gmail.com). Last update on 2021/12/17.

General PL-PTMs

Learning and Evaluating Contextual Embedding of Source Code, [code] ICML 2020 (CuBERT)

CodeBERT:A Pre-Trained Model for Programming and Natural Languages, [code] EMNLP 2020 Findings, (CodeBERT)

GraphCodeBERT: Pre-training Code Representations with Data Flow, [code] ICLR 2021 (GraphCodeBERT)

Unified Pre-training for Program Understanding and Generation, [code] NAACL 2021 (PLBART)

Unsupervised Translation of Programming Languages, [code] NeurIPS 2020 (TransCoder)

Exploring Software Naturalness through Neural Language Models, arXiv 2020/06 (C-BERT)

PYMT5: multi-mode translation of natural language and PYTHON code with transformers, EMNLP 2020 (PYMT5)

Contrastive Code Representation Learning, [code] arXiv 2020/07 (ContraCode)

DOBF: A Deobfuscation Pre-Training Objective for Programming Languages, arXiv 2021/02 (DOBF)

Studying the Usage of Text-To-Text Transfer Transformer to Support Code-Related Tasks, [code] ICSE 2021

CodeTrans: Towards Cracking the Language of Silicone’s Code Through Self-Supervised Deep Learning and High Performance Computing, [code] arXiv 2021/04 (CodeTrans)

How could Neural Networks understand Programs?, [code] ICML 2021 (OSCAR)

CoTexT: Multi-task Learning with Code-Text Transformer, arXiv 2021/05 (CoTexT)

Disentangled Code Representation Learning for Multiple Programming Languages, ACL-Fingings 2021 (CODEDISEN)

SYNCOBERT: Syntax-Guided Multi-Modal Contrastive Pre-Training for Code Representation, arXiv 2021/09 (SYNCOBERT)

TreeBERT: A Tree-Based Pre-Trained Model for Programming Language, UAI 2021

CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation, EMNLP 2021 [code] [blog] [media][slide][poster]

Task-specific PL-PTMs

Code Completion: Multi-task Learning based Pre-trained Language Model for Code Completion, ASE 2020 (CugLM)

Code Completion: IntelliCode Compose: Code Generation using Transformer, FSE 2020 (IntelliCode Compose)

Code Completion: Improving Code Autocompletion with Transfer Learning, arXiv 2021/05

Program Repair: Generating Bug-Fixes Using Pretrained Transformers, arXiv 2021/04 (DeepCode)

Program Repair: DeepDebug: Fixing Python Bugs Using Stack Traces, Backtranslation, and Code Skeletons, arXiv 2021/05 (DeepDebug)

Program Repair: TFix: Learning to Fix Coding Errors with a Text-to-Text Transformer, ICML 2021

Program Repair: CURE: Code-Aware Neural Machine Translation for Automatic Program Repair, ICSE 2021

Unit Test Generation: Unit Test Case Generation with Transformers and Focal Context, arXiv 2021/05

Code Generation: Evaluating Large Language Models Trained on Code, arXiv 2021/07 (Codex)

Code Generation: Program Synthesis with Large Language Models, arXiv 2021/08

Other Deep Models for Code-related Tasks

Language-Agnostic Representation Learning of Source Code from Structure and Context, [code] ICLR 2021 (Code Transformer)

GN-Transformer: Fusing AST and Source Code information in Graph Networks, openreview 2020/09 (GN-Transformer)

Program Repair: HOPPITY: LEARNING GRAPH TRANSFORMATIONS TO DETECT AND FIX BUGS IN PROGRAMS, ICLR 2020 (HOPPITY)

Benchmarks & Datasets

CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation, [code] arXiv 2021/02

Project CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks [code]

Measuring Coding Challenge Competence With APPS, arXiv 2021/05

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Repository files navigation

Recent Advances in Programming Language Pre-Trained Models (PL-PTMs)

General PL-PTMs

Task-specific PL-PTMs

Other Deep Models for Code-related Tasks

Benchmarks & Datasets

About

Releases

Packages

yuewang-cuhk/awesome-programming-language-pretraining-papers

Folders and files

Latest commit

History

README.md

README.md

Repository files navigation

Recent Advances in Programming Language Pre-Trained Models (PL-PTMs)

General PL-PTMs

Task-specific PL-PTMs

Other Deep Models for Code-related Tasks

Benchmarks & Datasets

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages