Research Focus: Week of May 27, 2024

Published

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

Research Focus: May 27, 2024

Register now for Research Forum on June 4

Join us for Research Forum (opens in new tab), an event series that explores recent research advances, bold new ideas, and important discussions with the global research community in the era of general AI. 

In Episode 3, researchers at Microsoft emphasize the importance of globally equitable AI, and will share novel use cases, transformative applications from industry to material design, and provide updates on AutoGen and MatterGen. 

Your registration includes access to our live chat with researchers on the event day. 

Episode 3 will air Tuesday, June 4 at 9:00 AM PT.

Generative AI and the Politics of Visibility

Generative AI tools have a remarkable capacity to produce complicated and lengthy texts, with just simple direction from users. AI proponents assert they can help writers, providing creative suggestions, completing half-written sentences or story fragments, and inventing character backstories. But this raises questions about the politics of visibility: what kinds of stories do these tools tend to generate, and what do they generally leave out? Do these tools fully represent diverse or marginalized populations and non-normative communities?

In a recent paper: Generative AI and the Politics of Visibility, a researcher from Microsoft tested three widely available generative AI tools (Bing Chat, ChatGPT, and Google’s Bard, now Gemini) with prompts designed to reveal their normative assumptions, prompting the tools multiple times with each to track the diversity of the outputs to the same query. His research demonstrates that, at least as currently designed and trained, generative AI tools tend to reproduce normative identities and narratives, rarely representing less common arrangements and perspectives unless specifically prompted. When they do generate variety, it is often narrow, maintaining deeper normative assumptions in what remains absent.


on-demand event

Microsoft Research Forum Episode 4

Learn about the latest multimodal AI models, advanced benchmarks for AI evaluation and model self-improvement, and an entirely new kind of computer for AI inference and hard optimization.

ACM MMSys 2024 Bandwidth Estimation in Real Time Communications Challenge

Videoconferencing has become indispensable for everything from global business operations to accessible education, transforming the way people communicate across physical barriers and geographical divides. The quality of experience (QoE) delivered by video conferencing systems depends in part on correctly estimating the capacity of the bottleneck link between the sender and the receiver over time. Bandwidth estimation for real-time communications (RTC) remains a significant challenge, primarily due to the continuously evolving heterogeneous network architectures and technologies. From the first bandwidth estimation challenge hosted by Microsoft at ACM MMSys 2021, researchers learned that bandwidth estimation models trained with reinforcement learning (RL) in simulations to maximize network-based reward functions may not be optimal, due to the sim-to-real gap and the difficulty of aligning network-based rewards with user-perceived QoE. In this year’s ACM MMSys 2024 Bandwidth Estimation in Real Time Communications Challenge, researchers from Microsoft aim to align reward maximization with user-perceived QoE optimization using offline RL and a real-world dataset released by Microsoft Teams. The challenge received enthusiastic participation from both academia and industry. All models submitted to the grand challenge underwent initial evaluation, and top models were further evaluated on a geographically distributed testbed. Challenge results show that by leveraging real-world data and integrating objective audio/video quality scores as rewards, offline RL can facilitate the development of competitive bandwidth estimators for RTC.


Player-Driven Emergence in LLM-Driven Game Narrative

Game creation is a labor-intensive process, with limited automation of non-graphic game elements related to dialogue and narrative structure. These elements are typically hand-coded and rigidly deterministic, with few options presented to the player. Large language models (LLMs) are beginning to show potential in the creation of richer and more expansive narrative spaces. 

In a recent paper: Player-Driven Emergence in LLM-Driven Game Narrative, accepted for presentation at the IEEE Conference on Games 2024, researchers from Microsoft in collaboration with members of the Xbox organization explore how interaction with LLMs can empower players to participate in the evolution of game narratives. As a testbed, they created a text-adventure game in which players attempt to solve a mystery under a fixed narrative premise but can freely interact with non-player characters generated by GPT-4, a state-of-the-art LLM. They recruited 28 gamers to play the game and used GPT-4 to automatically convert the game logs into a node-graph representing the narrative in the player’s gameplay. Through their interactions with the non-deterministic behavior of the LLM, players were able to discover interesting new emergent nodes that were not a part of the original narrative but have potential for being fun and engaging. Players that created the most emergent nodes tended to be those that often enjoy games that facilitate discovery, exploration and experimentation.


Segmentation using large language models: A new typology of American neighborhoods

The U.S. Census Bureau’s American Community Survey (ACS) is the country’s primary source of social and economic data. But much of the data is low quality, especially at the highest levels of geographic detail (Block Groups). As one zooms in geographically on a map, the resolution of social and economic data decreases, which is counterintuitive. Typically, zooming in generates more detail, not less. Recent changes in the U.S. statistical system have amplified this geographic-demographic resolution trade-off.

In a recent paper: Segmentation using large language models: A new typology of American neighborhoods, researchers from Microsoft present a solution to this problem in the form of an AI-based open and reproducible geodemographic classification system using small area estimates from the ACS. They employ a partitioning clustering algorithm to a range of socio-economic, demographic, and built environment variables. Using an open-source software pipeline ensures adaptability to future data updates. One key innovation is the integration of GPT-4, to generate intuitive cluster descriptions and names. This represents a novel application of natural language processing in geodemographic research and showcases the potential for human-AI collaboration within the geospatial domain.


From Local to Global: A Graph RAG Approach to Query-Focused Summarization 

The use of retrieval-augmented generation (RAG) to retrieve relevant information from an external knowledge source enables LLMs to answer questions over private and/or previously unseen document collections. However, RAG fails on global questions directed at an entire text corpus, such as: “What are the main themes in the dataset?”, since this is inherently a query-focused summarization (QFS) task, rather than an explicit retrieval task. Prior QFS methods fail to scale to the quantities of text indexed by typical RAG systems.

In a recent preprint: From Local to Global: A Graph RAG Approach to Query-Focused Summarization, researchers from Microsoft propose combining the strengths of these contrasting methods through a Graph RAG approach to question answering over private text corpora that scales with both the generality of user questions and the quantity of source text to be indexed. This approach uses an LLM to build a graph-based text index in two stages: first to derive an entity knowledge graph from the source documents, then to pre-generate community summaries for all groups of closely-related entities. Given a question, each community summary is used to generate a partial response, before all partial responses are again summarized in a final response to the user. For a class of global sensemaking questions over datasets in the 1 million token range, Graph RAG leads to substantial improvements over a naïve RAG baseline for both the comprehensiveness and diversity of generated answers.

Microsoft Research in the news

Microsoft Announces New Foundation Model For Digital Pathology, Diving Deeper Into Clinical Medicine  

Forbes | May 22, 2024

In partnership with Providence health system and the University of Washington, Microsoft has leveraged its significant work with generative AI to launch GigaPath, the first whole-slide foundation model for digital pathology that has been pre-trained with real-world data.

Spanish mini-satellites bring the internet to isolated areas (en español)  

La Razon | May 17, 2024

The Spanish company Fossa, with help from Microsoft Research, has successfully tested a small satellite weighing less than a kilogram that improves connectivity in places with little or no coverage, a potential boost for the internet of things (IoT).

Related publications

Continue reading

See all blog posts