Tiled, abstract background image for Microsoft Research Directions in ML: AutoML and Automating Algorithms series
July 28, 2020 June 29, 2021

Directions in ML: AutoML and Automating Algorithms

10:00 AM–11:00 AM PT

Location: Virtual

Structured Models for Automated Machine Learning

Portrait of Madeleine Udell

Speaker: Professor Madeleine Udell, Cornell University
Date: June 29, 2021 | 10:00 AM–11:00 AM PT

Abstract

Automated machine learning (AutoML) seeks algorithmic methods for finding the best machine learning pipeline and hyperparameters to fit a new dataset. The complexity of this problem is astounding: viewed as an optimization problem, it entails search over an exponentially large space, with discrete and continuous variables. An efficient solution requires a strong structural prior on the optimization landscape of this problem.

In this talk, we survey some of the most powerful techniques for AutoML on tabular datasets. We will focus in particular on techniques for meta-learning: how to quickly learn good models on a new dataset given good models for a large collection of datasets. We will see that remarkably simple structural priors, such as the low-dimensional structure used by the AutoML method Oboe, produce state-of-the-art results. The success of these simple models suggests that AutoML may be simpler than was previously understood.

Biography

Madeleine Udell is Assistant Professor of Operations Research and Information Engineering and Richard and Sybil Smith Sesquicentennial Fellow at Cornell University. She studies optimization and machine learning for large scale data analysis and control, with applications in marketing, demographic modeling, medical informatics, engineering system design, and automated machine learning. Her research in optimization centers on detecting and exploiting novel structures in optimization problems, with a particular focus on convex and low rank problems. These structures lead the way to automatic proofs of optimality, better complexity guarantees, and faster, more memory-efficient algorithms. She has developed a number of open source libraries for modeling and solving optimization problems, including Convex.jl, one of the top tools in the Julia language for technical computing.


Latent Stochastic Differential Equations: An Unexplored Model Class

Portrait of David Duvenaud

Speaker: Professor David Duvenaud, University of Toronto
Date: May 25, 2021 | 10:00 AM–11:00 AM PT

Abstract

We show how to do gradient-based stochastic variational inference in stochastic differential equations (SDEs), in a way that allows the use of adaptive SDE solvers. This allows us to scalably fit a new family of richly-parameterized distributions over irregularly-sampled time series. We apply latent SDEs to motion capture data, and to demonstrate infinitely-deep Bayesian neural networks. We also discuss the pros and cons of this barely-explored model class, comparing it to Gaussian processes and neural processes.

Biography

David Duvenaud is an assistant professor in computer science at the University of Toronto. His research focuses on continuous-time models, latent-variable models, and deep learning. His postdoc was done at Harvard University, and his Ph.D. at the University of Cambridge. David is a founding member of the Vector Institute for Artificial Intelligence, and also co-founded Invenia, an energy forecasting company.


Knowledge Distillation as Semiparametric Inference

Portrait of Lester Mackey

Speaker: Dr. Lester Mackey, Microsoft Research
Date: April 28, 2021 | 10:00 AM–11:00 AM PT

Abstract

More accurate machine learning models often demand more computation and memory at test time, making them difficult to deploy on CPU- or memory- constrained devices. Knowledge distillation alleviates this burden by training a less expensive student model to mimic the expensive teacher model while maintaining most of the original accuracy. To explain and enhance this phenomenon, we cast knowledge distillation as a semiparametric inference problem with the optimal student model as the target, the unknown Bayes class probabilities as nuisance, and the teacher probabilities as a plug-in nuisance estimate. By adapting modern semiparametric tools, we derive new guarantees for the prediction error of standard distillation and develop two enhancements—cross-fitting and loss correction—to mitigate the impact of teacher overfitting and underfitting on student performance. We validate our findings empirically on both tabular and image data and observe consistent improvements from our knowledge distillation enhancements.

Biography

Lester is a statistical machine learning researcher at Microsoft Research New England and an adjunct professor at Stanford University. He received his Ph.D. in Computer Science (2012), his M.A. in Statistics (2011) from UC Berkeley, and his B.S.E. in Computer Science (2007) from Princeton University. Before joining Microsoft, Lester spent three wonderful years as an assistant professor of Statistics and, by courtesy, Computer Science at Stanford and one as a Simons Math+X postdoctoral fellow, working with Emmanuel Candes. Lester’s Ph.D. advisor was Mike Jordan, and his undergraduate research advisors were Maria Klawe and David Walker. He got his first taste of research at the Research Science Institute and learned to think deeply of simple things at the Ross Program. Lester’s current research interests include statistical machine learning, scalable algorithms, high-dimensional statistics, approximate inference, and probability. Lately, he’s been developing and analyzing scalable learning algorithms for healthcare, climate forecasting, approximate posterior inference, high-energy physics, recommender systems, and the social good.


Self-Tuning Networks: Amortizing the Hypergradient Computation for Hyperparameter Optimization

Portrait of Roger Grosse

Speaker: Roger Grosse, University of Toronto
Date: March 31, 2021 | 10:00 AM–11:00 AM PT

Abstract

Optimization of many deep learning hyperparameters can be formulated as a bilevel optimization problem. While most black-box and gradient-based approaches require many independent training runs, we aim to adapt hyperparameters online as the network trains. The main challenge is to approximate the response Jacobian, which captures how the minimum of the inner objective changes as the hyperparameters are perturbed. To do this, we introduce the self-tuning network (STN), which fits a hypernetwork to approximate the best response function in the vicinity of the current hyperparameters. Differentiating through the hypernetwork lets us efficiently approximate the gradient of the validation loss with respect to the hyperparameters. We train the hypernetwork and hyperparameters jointly. Empirically, we can find hyperparameter settings competitive with Bayesian Optimization in a single run of training, and in some cases find hyperparameter schedules that outperform any fixed hyperparameter value.

Biography

Roger Grosse is an Assistant Professor of Computer Science at the University of Toronto, and a founding member of the Vector Institute for Artificial Intelligence. He received his Ph.D. in computer science from MIT, and then spent two years as a postdoc at the University of Toronto. He holds a Canada Research Chair in Probabilistic Inference and Deep Learning, an Ontario MRIS Early Researcher Award, and a CIFAR Canadian AI Chair.


Taking Advantage of Randomness in Expensive Optimization Problems

Portrait of Ryan Adams

Speaker: Ryan Adams, Princeton University
Date: February 25, 2021 | 10:00 AM–11:00 AM PT

Abstract

Optimization is at the heart of machine learning, and gradient computation is central to many optimization techniques. Stochastic optimization, in particular, has taken center stage as the principal method of fitting many models, from deep neural networks to variational Bayesian posterior approximations. Generally, one uses data subsampling to efficiently construct unbiased gradient estimators for stochastic optimization, but this is only one possibility. In this talk, I will discuss two alternative approaches to constructing unbiased gradient estimates in machine learning problems. The first approach uses randomized truncation of objective functions defined as loops or limits. Such objectives arise in settings ranging from hyperparameter selection, to fitting parameters of differential equations, to variational inference using lower bounds on the log-marginal likelihood. The second approach revisits the Jacobian accumulation problem at the heart of automatic differentiation, observing that it is possible to collapse the linearized computational graph of, e.g., deep neural networks, in a randomized way such that less memory is used but little performance is lost. These projects are joint work with students Alex Beatson, Deniz Oktay, Joshua Aduol, and Nick McGreivy.

Biography

Ryan Adams is interested in machine learning, artificial intelligence, and computational statistics, with applications across science and engineering. Ryan has broad interests but often works on probabilistic methods and approximate Bayesian inference. Ryan is the director of the Undergraduate Certificate in Statistics and Machine Learning (opens in new tab). He co-founded Whetlab (sold to Twitter in 2015) and formerly co-hosted the Talking Machines (opens in new tab) podcast. Ryan was faculty at Harvard from 2011 to 2016 and was at Twitter and then Google Brain before joining the faculty at Princeton in 2018. He calls his group the Laboratory for Intelligent Probabilistic Systems (LIPS) (opens in new tab).


Automating ML Performance Metric Selection

head shot of Dr. Sanmi Koyejo, University of Illinois at Urbana-Champaign

Speaker: Professor Sanmi Koyejo, University of Illinois at Urbana-Champaign
Date: January 27, 2021 | 10:00 AM–11:00 AM PT

Abstract

From music recommendations to high-stakes medical treatment selection, complex decision-making tasks are increasingly automated as classification problems.  Thus, there is a growing need for classifiers that accurately reflect complex decision-making goals. One often formalizes these learning goals via a performance metric, which, in turn, can be used to evaluate and compare classifiers. Yet, choosing the appropriate metric remains a challenging problem. This talk will outline metric elicitation as a formal strategy to address the metric selection problem. Metric elicitation automates the discovery of implicit preferences from an expert or an expert panel using relatively efficient and straightforward interactive queries. Beyond standard classification settings, I will also outline early work on metric selection for group-fair classification.

Biography

Sanmi (Oluwasanmi) Koyejo is an Assistant Professor in the Department of Computer Science at the University of Illinois at Urbana-Champaign. Koyejo’s research interests are in developing the principles and practice of adaptive and robust machine learning. Additionally, Koyejo focuses on applications to neuroscience and biomedical imaging. Koyejo completed his Ph.D. in Electrical Engineering at the University of Texas at Austin advised by Joydeep Ghosh and completed postdoctoral research at Stanford University with a focus on developing machine learning techniques for neuroimaging data. His postdoctoral research was primarily with Russell A. Poldrack and Pradeep Ravikumar. Koyejo has been the recipient of several awards including a best paper award from the conference on uncertainty in artificial intelligence (UAI), a Kavli Fellowship, an IJCAI early career spotlight, and a trainee award from the Organization for Human Brain Mapping (OHBM). Koyejo serves on the board of the Black in AI organization.


AI for Adaptive Experiment Design

head shot of Yisong Yue, Professor Yisong Yue, California Institute of Technology

Speaker: Professor Yisong Yue, California Institute of Technology
Date: December 15, 2020 | 10:00 AM–11:00 AM PT

Abstract

Experiment design is hallmark of virtually all research disciplines. In many settings, one important challenge is how to automatically design experiments over large action/design spaces. Furthermore, it is also important for such a procedure to be adaptive, i.e., to adapt to the outcomes of previous experiments. In this talk, I will describe recent progress in using data-driven algorithmic techniques for adaptive experiment design, also known as active learning and Bayesian optimization in the machine learning community. Building upon the Gaussian process (GP) framework, I will describe case studies in personalized clinical therapy and nanophotonic structure design. Motivated by these applications, I will show how to incorporate real-world considerations such as safety, preference elicitation, and multi-fidelity experiment design into the GP framework, with new algorithms, theoretical guarantees, and empirical validation. Time permitting, I will also briefly overview a few other case studies as well.

Biography

Yisong Yue is a professor of Computing and Mathematical Sciences (opens in new tab) at the California Institute of Technology. He was previously a research scientist at Disney Research (opens in new tab). Before that, he was a postdoctoral researcher in the Machine Learning Department (opens in new tab) and the iLab (opens in new tab) at Carnegie Mellon University. He received a Ph.D. from Cornell University (opens in new tab) and a B.S. from the University of Illinois at Urbana-Champaign (opens in new tab). Yisong’s research interests are centered around machine learning, and in particular getting theory to work in practice. To that end, his research agenda spans both fundamental research and in-depth collaborations with machine learning practitioners. In the past, his research has been applied to information retrieval, recommender systems, text classification, learning from rich user interfaces, analyzing implicit human feedback, data-driven animation, behavior analysis, sports analytics, experiment design for science, protein engineering, program synthesis, learning-accelerated optimization, robotics, and adaptive planning & allocation problems.


Automating Dataset Comparison and Manipulation with Optimal Transport

head shot of Dr. David Alvarez-Melis, Microsoft Research

Speaker: Dr. David Alvarez-Melis, Microsoft Research
Date: November 18, 2020 | 10:00 AM–11:00 AM PT

Abstract

Machine learning research has traditionally been model-centric, focusing on architectures, parameter optimization,  and model transfer. Much less attention has been given to the datasets on which these models are trained, which are often assumed to be fixed, or subject to extrinsic and inevitable change. However, successful application of ML in practice often requires substantial effort in terms of dataset preprocessing and manipulation, such as augmenting, merging, mixing, or reducing datasets. In this talk I will present some of our recent work that seeks to formalize and automatize these and other flavors of dataset manipulation under a unified approach. First, I will introduce the Optimal Transport Dataset Distance, which provides a fundamental theoretical building block: a formal notion of similarity between labeled datasets. In the second part of the talk, I will discuss how this notion of distance can be used to formulate a general framework of dataset optimization by means of gradient flows in probability space. I will end by presenting various exciting potential applications of this dataset optimization framework.

Biography

David Alvarez-Melis is a postdoctoral researcher at Microsoft Research, New England, and a research affiliate at MIT. He obtained a PhD in computer science from the MIT Computer Science and Artificial Intelligence Lab in 2019, where his thesis work spanned various topics in machine learning and natural language processing. He also holds BSc and MS degrees in mathematics from ITAM and Courant Institute (NYU), respectively. He has previously spent time at IBM Research, and is a recipient of CONACYT, Hewlett Packard and AI2 awards. David’s current research sits at the intersection of machine learning, optimization and applied mathematics, with a focus on optimal transport and its applications to dataset processing, transfer learning, and generative modeling.


AutoML and Interpretability: Powering the machine learning revolution in healthcare

head shot of Mihaela van der Schaar from the University of Cambridge for the Directions in ML talk series from Microsoft Research

Speaker: Mihaela van der Schaar, University of Cambridge, UCLA, and The Alan Turing Institute
Date: October 6, 2020 | 10:00 AM–11:00 AM PT

Abstract

AutoML and interpretability are both fundamental to the successful uptake of machine learning by non-expert end users. The former will lower barriers to entry and unlock potent new capabilities that are out of reach when working with ad-hoc models, while the latter will ensure that outputs are transparent, trustworthy, and meaningful. In healthcare, AutoML and interpretability are already beginning to empower the clinical community by enabling the crafting of actionable analytics that can inform and improve decision-making by clinicians, administrators, researchers, policymakers, and beyond. This talk presents state-of-the-art AutoML and interpretability methods for healthcare developed in our lab and how they have been applied in various clinical settings (including cancer, cardiovascular disease, cystic fibrosis, and recently Covid-19), and then explains how these approaches form part of a broader vision for the future of machine learning in healthcare.

Biography

Mihaela van der Schaar (opens in new tab) is the John Humphrey Plummer (opens in new tab) Professor of Machine Learning, Artificial Intelligence and Medicine at the University of Cambridge, a Fellow (opens in new tab) at The Alan Turing Institute in London, and a Chancellor’s Professor at UCLA (opens in new tab). Mihaela was elected IEEE Fellow in 2009. She has received numerous awards, including the Oon Prize on Preventative Medicine (opens in new tab) from the University of Cambridge (2018), a National Science Foundation CAREER Award (2004), 3 IBM Faculty Awards, the IBM Exploratory Stream Analytics Innovation Award, the Philips Make a Difference Award and several best paper awards, including the IEEE Darlington Award. Mihaela’s work has also led to 35 USA patents (many widely cited and adopted in standards) and 45+ contributions to international standards for which she received 3 International ISO (International Organization for Standardization) Awards. In 2019, she was identified by National Endowment for Science, Technology and the Arts as the most-cited female AI researcher in the UK (opens in new tab). She was also elected as a 2019 “Star in Computer Networking and Communications (opens in new tab)” by N²Women. Her research expertise span signal and image processing, communication networks, network science, multimedia, game theory, distributed systems, machine learning and AI. Mihaela’s current research focus is on machine learning, AI and operations research for healthcare and medicine.


Neural architecture search: Coming of age

Head shot of Frank Hutter for the Directions in ML talk series from Microsoft Research

Speaker: Frank Hutter, University of Freiburg
Date: September 9, 2020 | 10:00 AM–11:00 AM PT

Abstract

Neural Architecture Search (NAS) is a very promising but still young field. I will start this talk by discussing various works aiming to build a scientific community around NAS, including benchmarks, best practices, and open source frameworks. Then, I will discuss several exciting directions for the field: (1) a broad range of possible speedup techniques for NAS; (2) joint NAS + hyperparameter optimization in Auto-PyTorch to allow off-the-shelf AutoML; and (3) the extended problem definition of neural ensemble search (NES) that searches for a set of complementary architectures rather than a single one as in NAS.

Biography

Frank Hutter (opens in new tab) is a Full Professor for Machine Learning at the Computer Science Department of the University of Freiburg (Germany), as well as Chief Expert AutoML at the Bosch Center for Artificial Intelligence (opens in new tab). Frank holds a PhD from the University of British Columbia (2009) and a MSc from TU Darmstadt (2004). He received the 2010 CAIAC doctoral dissertation award for the best thesis in AI in Canada, and with his coauthors, several best paper awards and prizes in international competitions on machine learning, SAT solving, and AI planning. He is the recipient of a 2013 Emmy Noether Fellowship, a 2016 ERC Starting Grant, a 2018 Google Faculty Research Award, and a 2020 ERC PoC Award (opens in new tab). He is also a Fellow of ELLIS (opens in new tab) and Program Chair at ECML 2020. In the field of AutoML, Frank co-founded the ICML workshop series on AutoML (opens in new tab) in 2014 and has co-organized it every year since, co-authored the prominent AutoML tools Auto-WEKA (opens in new tab) and Auto-sklearn (opens in new tab), won the first two AutoML challenges with his team, co-authored the first book on AutoML, worked extensively on efficient hyperparameter optimization and neural architecture search, and gave a NeurIPS 2018 tutorial with over 3000 attendees.


head shot of Ameet Talwalkar for the Microsoft Research AutoML Horizons speaker series

Speaker: Ameet Talwalkar, Carnegie Mellon University
Date: July 28, 2020 | 10:00 AM–11:00 AM PT

Abstract

Neural architecture search (NAS)—the problem of selecting which neural model to use for your learning problem—is a promising direction for automating and democratizing machine learning. Early NAS methods achieved impressive results on canonical image classification and language modeling problems, yet these methods were algorithmically complex and massively expensive computationally. More recent heuristics relying on weight-sharing and gradient-based optimization are drastically more computationally efficient while also achieving state-of-the-art performance. However, these heuristics are also complex, are poorly understood, and have recently come under scrutiny because of inconsistent results on new benchmarks and poor performance as a surrogate for fully trained models. In this talk, we introduce the NAS problem and then present our work studying recent NAS heuristics from first principles. We first perform an extensive ablation study to identify the necessary components of leading NAS methods. We next introduce our geometry-aware framework called GAEA, which exploits the underlying structure of the weight-sharing NAS optimization problem to quickly find high-performance architectures. This leads to simple yet novel algorithms that enjoy faster convergence guarantees than existing gradient-based methods and achieve state-of-the-art accuracy on a wide range of leading NAS benchmarks. Together, our theory and experiments demonstrate a principled way to co-design optimizers and continuous parameterizations of discrete NAS search spaces.

Biography

Ameet Talwalkar (opens in new tab) is an assistant professor in the machine learning department at Carnegie Mellon University and is also co-founder and chief scientist at Determined AI. His interests are in the field of statistical machine learning. His current work is motivated by the goal of democratizing machine learning and focuses on topics related to scalability, automation, fairness, and interpretability of learning algorithms and systems.