Yifei Shen

Chercheur

À propos

I received the B.S. degree in computer science from ShanghaiTech University and Ph.D degree in the electronic and computer engineering from the Hong Kong University of Science and Technology. Since 2022, I have been with Microsoft Research Asia. I am currently working on the mechanisms of diffusion and language models.

Planning in language models: Planning is a fundamental construct of human intelligence, yet current language models, such as GPT-4, exhibit limited capabilities in planning. Our research approaches planning through the perspective of graph, inspired by several examples: (a) a mathematical proof involves navigating from existing theorems to the desired theorem, (b) the objective of tool agents is to identify a connected sub-graph within an API graph to cater to user request, and (c) planning is commonly assessed through maze path-finding in neuroscience. Inspired by statistical mechanics, we have developed both abstract and precisely solvable models to investigate planning in language models. Furthermore, we utilize these insights to improve the planning abilities of state-of-the-art LLMs.

Training dynamic analysis of planning capability in auto-regressive Transformers [Paper Link (opens in new tab)]
Application to enhancing task planning of LLMs (including Mistral, Llama, GPT-4, etc.) [Paper Link (opens in new tab)]

Training-free zero-shot diffusion guidance: Diffusion models learn from massive datasets, and traditionally rely on extensive additional training for guidance. It is interesting to challenge this norm by integrating optimization-based approaches to introduce constraints without the need for extra training. The analysis not only aids in unveiling the internal representation of diffusion models but also improves their efficiency and adaptability. Particularly, this approach enables the application of novel reward functions without prior training, thereby unlocking opportunities for open-ended decision-making.

Optimization analysis of training-free guidance [Paper Link (opens in new tab)]
Application to molecule diffusions [Paper Link (opens in new tab)]
Application to open-ended reinforcement learning [Paper Link (opens in new tab)]

In the past, I was interested in bridging the neural network architectures and classic non-learning algorithms to analyze their (OOD) generalization abilities. Here are some of my previous works:

GNNs and distributed algorithms [Paper in 2019 (opens in new tab)]
GNNs, matrix factorization, and subspace clustering [Paper in 2021 (opens in new tab)]
Transformers, Sparse MoEs, and classic vision algorithms [Paper in 2022 (opens in new tab)]

My other works in AI (GNNs, diffusion, and OOD generalization) are documented in my OpenReview (opens in new tab) profile. During my Ph.D, I have worked on information theory, signal processing, and wireless communication, and these works can be found in IEEExplore (opens in new tab). During my undergraduate, I proofread the High Dimensional Probability book (opens in new tab), which sparked my interest in learning theory.