OmniParser is a comprehensive method for parsing user interface screenshots into structured and easy-to-understand elements, which significantly enhances the ability of GPT-4V to generate actions that can be accurately grounded in the corresponding regions of…
Microsoft Audience Network (MSAN) part of the Microsoft AI (Artificial Intelligence) is seeking a Senior Applied Scientist-Ads. As the Senior Applied Scientist, you will specialize in creating and enhancing machine learning technologies in areas such…
The Interactive Multimodal AI Systems focuses on creating interactive systems and experiences that blend the richness and complexity of people and their real, physical world with advanced technology. We seek to leverage multimodal generative AI…
RadEdit stress-tests biomedical vision models by simulating dataset shifts through precise image editing. It uses diffusion models to create realistic, synthetic datasets, helping to identify model weaknesses and evaluate robustness.
The personalizable object recognizer Find My Things was recently recognized for accessible design. Researcher Daniela Massiceti and software development engineer Martin Grayson talk about the research project’s origins and the tech advances making it possible.