Computer Vision

Video

Hairmony: Fairness-aware hairstyle classification

October 17, 2024 | James Clemoes

We present a method for prediction of a person’s hairstyle from a single image. Despite growing use cases in user digitization and enrollment for virtual experiences, available methods are limited, particularly in the range of…

04:30

Video

Look Ma, no markers: holistic performance capture without the hassle

October 17, 2024 | Charlie Hewitt

We tackle the problem of highly-accurate, holistic performance capture for the face, body and hands simultaneously. Motion-capture technologies used in film and game production typically focus only on face, body or hand capture independently, involve…

03:25

Dataset Source Code

OmniParser

OmniParser is a comprehensive method for parsing user interface screenshots into structured and easy-to-understand elements, which significantly enhances the ability of GPT-4V to generate actions that can be accurately grounded in the corresponding regions of…

GitHub Publication

Publication

i-Code Studio: A Configurable and Composable Framework for Integrative AI

Yuwei Fang, Mahmoud Khademi, Chenguang Zhu, Ziyi Yang, Reid Pryzant, Yichong Xu, Yao Qian, Takuya Yoshioka, Lu Yuan, Michael Zeng, Xuedong Huang

The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP24 Demo Track) | October 2024

Group

Interactive Multimodal AI Systems (IMAIS)

The Interactive Multimodal AI Systems focuses on creating interactive systems and experiences that blend the richness and complexity of people and their real, physical world with advanced technology. We seek to leverage multimodal generative AI…

Publication

Multimodal Large Language Models Make Text-to-Image Generative Models Align Better

Xun Wu, Shaohan Huang, Furu Wei

October 2024

Publication

CaesarNeRF: Calibrated Semantic Representation for Few-shot Generalizable Neural Rendering

Haidong Zhu, Tianyu Ding, Tianyi Chen, Ilya Zharkov, Ram Nevatia, Luming Liang, Luming Liang

2024 European Conference on Computer Vision | October 2024

Publication

MedImageInsight: An Open-Source Embedding Model for General Domain Medical Imaging

Noel Codella, Yu Gu, Shrey Jain, Ho Hin Lee, Asma Ben Abacha, Alberto Santamaria-Pang, Will Guyman, Natieek Sangani, Sheng Zhang, Hoifung Poon, Stephanie Hyland, Shruthi Bannur, Javier Alvarez-Valle, Xue Li, John Garett, Alan McMillan, Gaurav Rajguru, Madhu Maddi, Nilesh Vijayrania, Reehan Bhimai, Nick Mecklenburg, Rupal Jain, Daniel Holstein, Naveen Gaur, Vijay Aski, Jenq-Neng Hwang, Thomas Lin, Ivan Tarapov, Matthew P Lungren, Mu Wei

October 2024

Publication

Motion Graph Unleashed: A Novel Approach to Video Prediction

Luming Liang, Ilya Zharkov

October 2024

Microsoft Research Blog

Stress-testing biomedical vision models with RadEdit: A synthetic data approach for robust model deployment

September 30, 2024 | Max Ilse, Daniel Coelho de Castro, Javier Alvarez-Valle

RadEdit stress-tests biomedical vision models by simulating dataset shifts through precise image editing. It uses diffusion models to create realistic, synthetic datasets, helping to identify model weaknesses and evaluate robustness.

Microsoft at CVPR 2024: Innovations in computer vision and AI research

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models

Exploring how context, culture, and character matter in avatar research

FeatUp: A Model-Agnostic Framework for Features at Any Resolution

Hairmony: Fairness-aware hairstyle classification

Look Ma, no markers: holistic performance capture without the hassle

OmniParser

i-Code Studio: A Configurable and Composable Framework for Integrative AI

Interactive Multimodal AI Systems (IMAIS)

Multimodal Large Language Models Make Text-to-Image Generative Models Align Better

CaesarNeRF: Calibrated Semantic Representation for Few-shot Generalizable Neural Rendering

MedImageInsight: An Open-Source Embedding Model for General Domain Medical Imaging

Motion Graph Unleashed: A Novel Approach to Video Prediction

Stress-testing biomedical vision models with RadEdit: A synthetic data approach for robust model deployment

Computer Vision

Highlights