The Deep Program Understanding project aims to teach machines to understand complex algorithms, combining methods from the programming languages, software engineering and the machine learning communities.
We have open-sourced many of our work and implementations, including utilities and project-specific sample code. See our Publications and Downloads tabs for more details.
Learning to understand programs
Building “smart” software engineering tools requires learning to analyse and understand existing code and related artefacts such as documentation and online resources (e.g., StackOverflow). One of our primary concerns is the integration of standard static analysis methods with machine learning methods to create learning-based program analyses that can be used within software engineering tools. Such tools can then be used to find bugs, automatically retrieve or produce relevant documentation, or verify programs.
Highlighted publications
- Self-Supervised Bug Detection and Repair (NeurIPS’21) | Code on GitHub
- Typilus: Neural Type Hints (PLDI’20)
- Learning to Represent Edits (ICLR’19) | Code on GitHub
- Learning to Represent Programs with Graphs (ICLR’18) | Code on GitHub
- A Survey of Machine Learning for Big Code and Naturalness (ACM Computing Surveys 2018)
Learning to generate programs
A core problem of machine learning is to learn algorithms that explain observed behaviour. This can take several forms, such as program synthesis from examples, in which an interpretable program matching given input/output pairs has to be produced; or alternatively programming by demonstration, in which a system has to learn to mimic sequences of actions.
Highlighted publications
- Learning to Complete Code with Sketches (ICLR’22)
- Fast and Memory-Efficient Neural Code Completion (MSR’20)
- Generative Code Modeling with Graphs (ICLR’19) | Code on GitHub
- DeepCoder: Learning to Write Programs (ICLR’17) | Code on GitHub
- TerpreT: A Probabilistic Programming Language for Program Induction (Tech Report, 2016) | Code on GitHub
- Bimodal Modelling of Source Code and Natural Language (ICML’15)
Advancing the machine learning frontier
Structured data such as programs represent a challenge for machine learning methods. The combination of domain constraints, known semantics and complex structure requires new machine learning methods and techniques. Our focus in this area is the analysis and generation of graphs, for which we have developed novel neural network architectures and generative procedures.
Highlighted publications
- HEAT: Hyperedge Attention Networks (opens in new tab)
- Constrained Graph Variational Autoencoders for Molecule Design (opens in new tab) (NeurIPS’18) | Code on GitHub (opens in new tab)
- Graph Partition Neural Networks for Semi-Supervised Classification (opens in new tab) (ICLR’18 Workshop) | Code on GitHub (opens in new tab)
- Gated Graph Sequence Neural Networks (opens in new tab) (ICLR’16) | Code on GitHub (opens in new tab)