Deep learning, machine learning advancements highlight Microsoft’s research at NIPS 2015

Published

Posted by George Thomas Jr.
NIPS 2015 (opens in new tab)Deep learning is all around us.

As part of the broad discipline that is machine learning (opens in new tab), deep learning (opens in new tab) increasingly is embedded in our daily lives.

Skype Translator (opens in new tab) is learning how to translate spoken words into more and more languages in near real-time, allowing for face-to-face communication once exclusive to Star Trek’s fictional universal translator.

Cortana (opens in new tab), the personal digital assistant that debuted on Windows Phones and is now widely available on Windows 10 and for Android and iPhone users, learns from your interactions and helps you do things like manage your calendar, track packages and even chat with you and tell jokes — all the while customizing the experience for truly personal interaction.

Microsoft Research Podcast

Collaborators: Holoportation™ communication technology with Spencer Fowers and Kwame Darko

Spencer Fowers and Kwame Darko break down how the technology behind Holoportation and the telecommunication device being built around it brings patients and doctors together when being in the same room isn’t an easy option and discuss the potential impact of the work.

Clutter (opens in new tab), a feature of Microsoft Office 2016, learns which emails are most important to you and automatically redirects less important mail to a separate folder, helping keep your inbox clean.

And that’s just the tip of the iceberg.

A new computational era

“We’re just at the very, very beginning of a computational era — computation will touch every aspect of our lives,” says Cynthia Dwork (opens in new tab), a cryptographer and distinguished scientist at Microsoft Research.

Dwork is among a bevy of Microsoft researchers and engineers whose work — more than 20 accepted papers — will be presented next week at the 2015 Conference and Workshop on Neural Information Processing Systems (opens in new tab) (NIPS). It is the premiere conference on machine learning, but the pervasive nature of machine learning has seen it nearly double in size, growing to more than 4,000 attendees compared to 2,500 last year.

“People see machine learning as more and more important — and deep learning is increasingly central to business,” says Li Deng, partner research manager in the Redmond, Wash. research lab.

Deng, recently selected for the 2015 IEEE Signal Processing Society Technical Achievement Award (opens in new tab) for outstanding contributions to deep learning and to automatic speech recognition, pioneered Microsoft’s deep-learning speech recognition research. Its implications translate into any number of speech-enabled applications.

Microsoft’s strength, in fact, is how it is integrating deep learning methodologies across many of the company’s products, including Office products, Bing search, Bing ads, and a number of others.

Deep learning at Microsoft

Li Deng says deep learning is a major area of focus at Microsoft and already accounts for numerous successes.

“In short, there is massive work with success achieved by deep learning within Microsoft,” Deng says.

Deng co-authored one of the papers, with his colleagues Jianshu Chen, Lin Xiao (opens in new tab), Xiaodong He (opens in new tab), Jianfeng Gao (opens in new tab) and Ji He, accepted into NIPS: End-to-end Learning of LDA by Mirror-Descent Back Propagation over a Deep Architecture (opens in new tab). “It’s a new way of thinking about deep learning,” he notes, as it stems from a unique approach emanating from business needs.

Previously, standard discriminative deep neural networks were used by Deng’s team to predict, with high accuracy, a set of outcomes of high business value. However, the system’s users also demand high interpretability: they want to know why and how the high accuracy is achieved and what kinds of raw features account for the success, because these are critical to their business decisions.

To obtain both interpretability and high prediction accuracy, Deng and colleagues combined the strengths of deep generative and discriminative models. The novel method described in their NIPS paper exploits the power of discriminative learning (via backpropagation (opens in new tab)) to train the underlying generative topic model called Latent Dirichlet Allocation (opens in new tab) (LDA).

This is akin to the earlier popular method of training (shallow) generative speech models using discriminative criteria before deep neural nets took over the field. But in contrast, the new model architecture described in the paper is very deep, where the data flow in the intermediate layers is designed according to the natural inference steps associated with the original generative LDA model.

A paradigm shift in adaptive data analysis

Cynthia Dwork and her colleagues have created new algorithms that represent a paradigm shift in machine learning and data analytics.

Dwork, too, is presenting a new approach to machine learning. Her paper, Generalization in Adaptive Data Analysis and Holdout Reuse (opens in new tab), in fact represents a true paradigm shift in adaptive data analysis by solving the problem known as overfitting, comically referred to as “the bane of data analysts” in the paper’s abstract.

Analysts rarely run just a single machine learning algorithm on data; analysis is more often an “adaptive” process, in which each new step may depend on the outcomes of previous steps. But this can result in “overfitting” — learning things about the data set that do not apply to the population from which the data were drawn — and, Dwork notes, is among the reasons for inaccuracies in some scientific research.

But Dwork and her colleagues have shown that accessing the data through a “differentially private” algorithm — a concept developed at Microsoft Research and the subject of over a thousand scholarly articles — prevents overfitting even in adaptive analysis. Informed by this insight, they have provided algorithms allowing analysts to work as they are accustomed to do with a training set, and to check their results using differential privacy, on a holdout set. By accessing the holdout set in a differentially private manner it can be re-used a great many times, repeatedly playing the role of “fresh” data.

“Having a reusable holdout set is key,” Dwork says, adding others already are building on her work for a range of applications.

Computer vision and crowdsourcing

Computer vision also is well represented at NIPS. Pushmeet Kohli (opens in new tab) from Microsoft’s research lab in Cambridge, U.K., contributed to the paper Deep Convolutional Inverse Graphics Network (opens in new tab), which introduces a 3D rendering engine that enables a computer to render an object in 3D that it’s never seen before. For example, you can show it an image of a chair and ask what the chair would look like, rotated.

Another paper coauthored by senior researcher Dengyong Zhou (opens in new tab) helps solve a key machine learning problem involving crowdsourcing. Double or Nothing: Multiplicative Incentive Mechanisms for Crowdsourcing (opens in new tab) introduces a method that in initial tests has led to a three-fold decrease in error rates in crowdsourced data labeling.

Data in machine learning models must have labels to be meaningful — like a photo of a car that is labeled “car.” The labeled data then can be compared to unlabeled data, and the computer can guess how to classify that data based on previously labeled similar data.

In the past, data labeling was left to experts, but the limited number of experts would limit the size of data sets. This led to crowdsourcing the data labeling task — recruiting non-experts via the Internet — which in turn resulted in some poor quality labeling.

Zhou and his colleague, Nihar B. Shah (opens in new tab) of the University of California-Berkeley, introduce a simple and unique incentive that allows the crowdsourced “judges” to skip items and estimate their confidence in each label, thereby significantly increasing the quality of labeling.

Reducing bottlenecks in machine learning

But reducing the bottleneck in properly labeled data isn’t the only one affecting the industry.

“Machine learning used to be all about generalization,” says Patrice Simard (opens in new tab), a distinguished engineer also based in Microsoft’s Redmond lab. “Then people realized that for some problems, the bottleneck is not the algorithm, it is the teacher.”

The “some problems,” he says, applies to problems for which very little data is available. “In these cases, the risk of overfitting is pervasive and the need for human supervision is dire” in the form of labels or features.

For example, one such problem could be to recognize a command given to an oven, a car, or a personal assistant like Cortana. Each command, in each language, for each device, service, or application requires building a custom model that is robust to all the creative ways humans can say the same command (or not). A custom system that correctly interprets “lights on first floor off” or “patio, on” in the right context can be built with a few hundred labels and the right features. With the right tools, this can be done in a couple hours of teaching time with little machine learning expertise. It does not require deep learning or solving artificial intelligence.

Patrice Simard says the future of machine learning is machine teaching.

Simard is at the forefront of a machine teaching project (opens in new tab) developing tools that would allow anyone to teach a computer how to do machine learning tasks even if that person lacks expertise in data analysis or computer science.

An example of this is the Language Understanding Intelligent Service (LUIS) (opens in new tab), which offers for free a fast and effective way of adding language understanding to applications. Recently released to beta, LUIS is part of Project Oxford (opens in new tab) and provides world-class, pre-built models from Bing and Cortana and guides users through the process of quickly building specialized models.

“Right now most of the machine learning community is interested in algorithms, and I believe it will evolve to care far more about productivity,” Simard says, with machine teaching helping to make machine learning easier for non-experts.

A world of algorithms

In the meantime, algorithms certainly aren’t going away.

“Algorithms are increasingly affecting our world,” says Dwork. “Data sets are growing and becoming of increasing interest to social scientists, politicians — we’re about to witness a huge increase in computational social sciences.”

While computational neuroscience remains an aspect of NIPS, recent proceedings have been dominated by papers on machine learning, artificial intelligence and statistics, according to the NIPS website. And this year a dedicated tutorial and a dedicated symposium on deep learning at NIPS hints at the future — one Microsoft is anticipating.

“Right now we have over 100 people across the company dedicated to deep learning,” Deng says. And given he may not be aware of all related activity, he says the number likely is higher.

“In speech recognition, visual object recognition, and a few other areas of AI, if you’re not in deep learning, you’re outside the mainstream today,” he says.

Related:

Other Microsoft research papers at NIPS 2015

Efficient Non-greedy Optimization of Decision Trees and Forests (opens in new tab)
Microsoft contributors: Matthew Johnson, Pushmeet Kohli

Logarithmic Time Online Multiclass prediction (opens in new tab)
Microsoft contributor: John Langford

Efficient and Parsimonious Agnostic Active Learning (opens in new tab)
Microsoft contributors: Tzu-Kuo Huang, Alekh Agarwal, John Langford, Robert Schapire

Private Graphon Estimation for Sparse Graphs (opens in new tab)
Microsoft contributors: Christian Borgs, Jennifer Chayes

Visalogy: Answering Visual Analogy Questions (opens in new tab)
Microsoft contributors: Ross Girshick, Larry Zitnick

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks (opens in new tab)
Microsoft contributors: Kaiming He, Ross Girshick, Jian Sun

Robust Regression via Hard Thresholding (opens in new tab)
Microsoft contributors: Purushottam Kar, Prateek Jain, Kush Bhatia

Locally Non-linear Embeddings for Extreme Multi-label Learning (opens in new tab)
Microsoft contributors: Purushottam Kar, Prateek Jain, Manik Varma, Kush Bhatia

Convergence Rates of Active Learning for Maximum Likelihood Estimation (opens in new tab)
Microsoft contributor: Praneeth Netrapalli

Multi-class SVMs: From Tighter Data-Dependent Generalization Bounds to Novel Algorithms (opens in new tab)
Microsoft contributor: Urun Dogan

Bandit Smooth Convex Optimization: Improving the Bias-Variance Tradeoff (opens in new tab)
Microsoft contributor: Ofer Dekel

Fast Convergence of Regularized Learning in Games (opens in new tab)
Microsoft contributors: Vasilis Syrgkanis, Alekh Agarwal, Robert Schapire

No-Regret Learning in Repeated Bayesian Games (opens in new tab)
Microsoft contributor: Vasilis Syrgkanis

On Elicitation Complexity and Conditional Elicitation (opens in new tab)
Microsoft contributor: Ian Kash

Streaming Min-max Hypergraph Partitioning (opens in new tab)
Microsoft contributors: Dan Alistarh, Milan Vojnovic

Alternating Minimization for Regression Problems with Vector-valued Outputs (opens in new tab)
Microsoft contributor: Prateek Jain

Predtron: A Family of Online Algorithms for General Prediction Problems (opens in new tab)
Microsoft contributor: Prateek Jain

Stochastic Online Greedy Learning with Semi-bandit Feedbacks (opens in new tab)
Microsoft contributor: Wei Chen

Fast and Memory Optimal Low-Rank Matrix Approximation (opens in new tab)
Microsoft contributor: Seyoung Yun

Finite-Time Analysis of Projected Langevin Monte Carlo (opens in new tab)
Microsoft contributor: Sebastien Bubeck

Continue reading

See all blog posts