Portrait of Shijie Cao

Shijie Cao

Senior Researcher

About

I am Shijie Cao (曹士杰), a senior researcher at the Systems Research Group in Microsoft Research Asia. I received my Ph.D. degree in Computer Science from Harbin Institute of Technology (HIT) in 2021 through a joint-PhD program with MSRA, under the supervision of Dr. Hsiao-Wuen Hon (opens in new tab) and Prof. Lanshun Nie (opens in new tab). Prior to that, I earned my B.E. degree in Computer Science from HIT in 2016. From 2015 to 2021, I served as a long-term intern at MSRA’s system area mentored by Dr. Ningyi Xu (opens in new tab), and Dr. Lintao Zhang (opens in new tab).

My research interests lie at the intersection of computer system/architecture and deep learning, including domain-specific architectures, software-hardware co-design, deep learning compression and acceleration, etc. More recently, my research has been focused on low-bit large language model and its efficient computing in system/hardware.

Please feel free to contact me for internships and collaborations at shijiecao@microsoft.com (opens in new tab).

 

-News-

May 2024  BitDistiller (opens in new tab) [code (opens in new tab)] is accepted to the ACL 2024 main conference. AFPQ (opens in new tab) [code (opens in new tab)]is accepted to the ACL 2024 findings.

Apr 2024   We released BitBLAS (opens in new tab) and T-MAC (opens in new tab), libraries to support mixed-precision matrix multiplications on GPU and CPU respectively, specially designed for low-bit LLM deployment.

Mar 2024   Our paper Ladder: Enabling Efficient Low-Precision Deep Learning Computing through Hardware-aware Tensor Transformation [code (opens in new tab)] was accepted to OSDI 2024.

Mar 2024   Our paper Pre-gated MoE (opens in new tab) was accepted to ISCA 2024.

Feb 2024   We released BitDistiller (opens in new tab), a QAT with Self-Distillation framework to enhance ultra low-bit LLM.

Nov 2024   We released AFPQ (opens in new tab), a asymmetric floating point quantization method for LLM.

May 2023   We released a comparative analysis of Integer and Floating Point formats for low-bit quantization on Large Language Models. [paper (opens in new tab)]

Feb 2023   Our paper nmSPARSE (opens in new tab) was accepted to MLSys 2023.