illustration of a network of lines connecting to form a brain in shades of blue

Microsoft Research & Berkeley AI Research (BAIR)

Phase 1 collaborations

Combining Causal Reasoning and Information Theory to Empower Humans through Human-Agent Collaboration

Emre Kiciman (MSR AI), Professor Anca Dragan (BAIR), Professor Pieter Abbeel (BAIR), Stas Tiomkin (Postdoc), Yuqing Du (PhD student)

We seek to create new collaboration strategies between a human and an artificial agent in which the agent enhances the human’s capabilities through actions that increase the causal leverage, or empowerment to influence the environment, of both robot and human. That is, the agent will act to increase its own, as well as the human’s, future options. The key motivations behind this strategy are the inherent safety considerations – the agent will not limit the human’s actions – and the ability to provide goal-agnostic, seamless assistance. We will also explore assistance that transitions between goal-oriented and empowerment-based behaviors, depending on the agent’s confidence in the human’s goals.


Learning to Collaborate with Human Players

Katja Hofmann (MSR Cambridge), Sam Devlin (MSR Cambridge), Kamil Ciosek (MSR Cambridge), Professor Anca Dragan (BAIR), Micah Carroll (PhD student)

We study how reinforcement learning approaches can enable agents in video games to learn to collaborate with human players. Current approaches are limited in their ability to generalize to real human play, for example when human players make unexpected game moves. We start by analyzing these shortcomings and consider several directions for improving over the current state of the art, including improved human behavior modeling, incorporating human biases, and improving generalization of reinforcement learning approaches in multi-agent settings.


Agnostic Reinforcement Learning

Alekh Agarwal (MSR AI), Professor Peter Bartlett (BAIR), Professor Moritz Hardt (BAIR), Juan Perdomo (PhD student)

Our goal is to advance our theoretical understanding of reinforcement learning in domains where the available set of control policies has limited expressivity relative to the complexity of the environment. A cornerstone of statistical learning theory for supervised learning is the existence of distribution agnostic performance guarantees. We want to understand when the agent can hope to find an approximately best policy from this class, irrespective of whether better policies exist outside the set or not.


Accurate Inference with Adaptively Collected Data

Lester Mackey (MSR New England), Professor Martin Wainwright (BAIR), Koulik Khamaru (PhD student), Yash Deshpande (Postdoc)

Estimators computed from adaptively collected data do not behave like non-adaptive estimators. Rather, the sequential dependence of the collection policy can lead to severe distributional biases that persist even in the infinite data limit. In past work, we developed a general method – W-decorrelation – for transforming the bias of adaptive linear regression estimators into variance. We now aim to expand the scope and impact of this line of work by moving beyond the linear model setting and developing more powerful procedures that exploit additional knowledge of the data collection process.


Investigations into the Complexity of Nonconvex Optimization

Sebastien Bubeck (MSR AI), Professor Peter Bartlett (BAIR), Yeshwanth Cherapanamjeri (PhD student)

Over the past decade, the adoption of increasingly complex machine learning models, most notably neural networks, has enabled astonishing progress in a range of applications spanning computer vision, natural language, and speech processing. However, the use of these models also presents new algorithmic challenges. In particular, the loss functions that arise in the use of such models tend to be non-convex, while existing algorithms are designed to operate on well-behaved loss surfaces. We will pursue a new angle on non-convex optimization to obtain better algorithms more suited to the non-convex nature of such problems.


Robustness for Deep Learning

Jerry Li (MSR AI), Professor Dawn Song (BAIR), Professor Jacob Steinhardt (BAIR), Dan Hendrycks (PhD student)

This collaboration seeks to teach AI systems right from wrong and considers the impacts and dangers of future AI systems. We will develop a suite of benchmarks which measure the ethical knowledge of current ML systems, in particular, for language processing. For example, they will test outcomes from providing a contemporary NLP model a text scenario and having it predict whether everyday people would find the scenario morally objectionable or morally permissible. The more accurate the model, the more it demonstrates ethical knowledge. When commonsense moral intuitions are ambivalent, precise theories of normative ethics are necessary. To this end, we test how well systems demonstrate knowledge of moral duties, factors determining wellbeing, and finally virtues, corresponding to classical deontological, utilitarian, and Aristotelian theories of morality. We are working toward ethical AI by the modeling of human values and ethical principles.


Enabling Broader Learning through Simplified and Personalized Summaries

Paul Bennett (MSR AI), Tobias Schnabel (MSR AI), Professor Marti Hearst (BAIR), Philippe Laban (PhD student)

Whether it is coming up to speed on the news, learning about a new topic at university, or reading up on a passionate hobby, everyone has experienced the frustration that comes with finding a potentially great resource text – only to find the text is written for the wrong audience and so long that the investment in time may not be worth the payoff. We seek to use AI techniques to enable summarizing documents in a way that: (1) simplifies the text to fit the reader’s background; (2) adapts to the reader’s previous knowledge; (3) accounts for the reader’s overall goals. Each goal targets an increasingly ambitious step forward in AI research that holds the potential to change how we learn from text documents in almost every setting.