Computer Vision

Publication

Look Ma, no markers: holistic performance capture without the hassle

Charlie Hewitt, Fatemeh Sadat Saleh, Sadegh Aliakbarian, Lohit Petikam, Shideh Rezaeifar, Louis Florentin, Zafiirah Hosenie, Tom Cashman, Julien Valentin, Prof. Darren Cosker, Tadas Baltrusaitis

ACM Transactions on Graphics | December 2024, Vol 36(6): pp. 235:1-235:12

Video

Publication

Hairmony: Fairness-aware hairstyle classification

Givi Meishvili, James Clemoes, Charlie Hewitt, Zafiirah Hosenie, Xiao-Xian, Martin de La Gorce, Tibor Takacs, Tadas Baltrusaitis, Antonio Criminisi, Chyna McRae, Nina Jablonski, Marta Wilczkowiak (SHE/HER)

SIGGRAPH Asia | December 2024

Video

Career Opportunity

Senior Researcher – Microsoft Azure AI Platform

Posted: October 17, 2024

Location: Redmond, WA, US

Research Area(s): Artificial intelligence, Computer vision, Human language technologies

Our team in Microsoft Azure AI Platform is at the forefront of developing multimodal AI technologies that combine language, vision, and other sensory inputs to power Microsoft AI products. We are seeking a Senior Researcher…

Video

Hairmony: Fairness-aware hairstyle classification

October 17, 2024 | James Clemoes

We present a method for prediction of a person’s hairstyle from a single image. Despite growing use cases in user digitization and enrollment for virtual experiences, available methods are limited, particularly in the range of…

04:30

Video

Look Ma, no markers: holistic performance capture without the hassle

October 17, 2024 | Charlie Hewitt

We tackle the problem of highly-accurate, holistic performance capture for the face, body and hands simultaneously. Motion-capture technologies used in film and game production typically focus only on face, body or hand capture independently, involve…

03:25

Dataset Source Code

OmniParser

OmniParser is a comprehensive method for parsing user interface screenshots into structured and easy-to-understand elements, which significantly enhances the ability of GPT-4V to generate actions that can be accurately grounded in the corresponding regions of…

GitHub Publication

Publication

i-Code Studio: A Configurable and Composable Framework for Integrative AI

Yuwei Fang, Mahmoud Khademi, Chenguang Zhu, Ziyi Yang, Reid Pryzant, Yichong Xu, Yao Qian, Takuya Yoshioka, Lu Yuan, Michael Zeng, Xuedong Huang

The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP24 Demo Track) | October 2024

Career Opportunity

Senior Applied Scientist – Ads

Posted: October 5, 2024

Location: Mountain View, CA, US; Redmond, WA, US

Research Area(s): Artificial intelligence, Computer vision, Human language technologies

Microsoft Audience Network (MSAN) part of the Microsoft AI (Artificial Intelligence) is seeking a Senior Applied Scientist-Ads. As the Senior Applied Scientist, you will specialize in creating and enhancing machine learning technologies in areas such…

Group

Interactive Multimodal AI Systems (IMAIS)

The Interactive Multimodal AI Systems focuses on creating interactive systems and experiences that blend the richness and complexity of people and their real, physical world with advanced technology. We seek to leverage multimodal generative AI…

Publication

Multimodal Large Language Models Make Text-to-Image Generative Models Align Better

Xun Wu, Shaohan Huang, Furu Wei

October 2024

Microsoft at CVPR 2024: Innovations in computer vision and AI research

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models

Exploring how context, culture, and character matter in avatar research

FeatUp: A Model-Agnostic Framework for Features at Any Resolution

Look Ma, no markers: holistic performance capture without the hassle

Hairmony: Fairness-aware hairstyle classification

Senior Researcher – Microsoft Azure AI Platform

Hairmony: Fairness-aware hairstyle classification

Look Ma, no markers: holistic performance capture without the hassle

OmniParser

i-Code Studio: A Configurable and Composable Framework for Integrative AI

Senior Applied Scientist – Ads

Interactive Multimodal AI Systems (IMAIS)

Multimodal Large Language Models Make Text-to-Image Generative Models Align Better

Computer Vision

Highlights