Microsoft’s Rick Szeliski previews CVPR 2015

Published June 8, 2015

Share this page

I read through some of the papers to be presented at CVPR 2015 this week and noticed interesting trends emerging. The opening session addresses two of the most exciting and active areas of research within computer vision, namely deep learning and modeling from depth cameras.

The session on deep learning includes papers that show how deep convolution networks can be extended to perform per-pixel segmentation, how to improve performance with various warping and architectural enhancements, the invariance (and other properties) of deep network layers, and how to “fool” deep neural nets with blatantly impossible images (or reconstruct plausible inputs). Deep learning papers are scattered throughout the rest of the conference, and without double are the most active areas of research in computer vision at the moment.

In 3D modeling from depth camera images, there are papers on modeling and tracking moving deformable objects (as opposed to the usual case of static scenes). Several papers advance the state of the art in recovering 3D models from single (monocular) images, most commonly using prior knowledge about interior or exterior architectural (and furniture) layout.

Another area closely related to visual object detection and recognition that has also received wide coverage in advance of the conference, is language and vision, which includes automatic image caption generation. This year’s CVPR conference presents several papers from different institutions on this topic. An online evaluation (opens in new tab) shows the current leaderboard results. 3D models and pose estimation also continue to be used as part of object detection and recognition systems.

Traditional computer vision algorithms continue to improve, both in terms of speed and accuracy. For example, an optic flow algorithm that used principal component analysis (PCA) or overlapping parametric models followed by a layered assignment of pixels produces high quality and/or dramatically faster speeds. Another algorithm, based on first matching salient edges, outperforms all previously developed algorithms (on the Sintel data set) by a wide margin. Optic flow algorithms are also being developed for depth map videos, otherwise known as scene flow.

In stereo matching, improvements can be obtained by classifying the expected orientation of surfaces or by using an alternative regularization term that is invariant to surface parameterization. 3D reconstruction from field cameras (consisting of many small lenslets) also continues to advance with better small baseline correspondence methods, layered models, and photometric shading constraints.

Note: I don’t work in the areas of object detection tracking, action and pose recognition, face recognition and tracking, and lots of others, so I didn’t look at any of these papers.

(opens in new tab)Rick Szeliski leads the Interactive Visual Media Group (opens in new tab) at Microsoft Research (opens in new tab) and is an Affiliate Professor at the University of Washington (opens in new tab). His research interests include using vision to automatically build 3-D models from images, computational photography, and image-based rendering.

For more computer science research news, visit ResearchNews.com (opens in new tab).

Microsoft Research Blog

Microsoft Research Newsletter