Ruofan Wu 乌若凡

2nd year PhD @ Umich

I am a second year Ph.D. student at University of Michigan, advised by Prof. Mosharaf Chowdhury. I received my Bachelor’s and Master’s degree in computer science at Renmin University of China (RUC) under the supervision of Prof. Feng Zhang. My research interests lie in machine learning compilers and scalable machine learning systems, with recent and upcoming work aiming to build energy-efficient execution stacks for large model training, particularly for generative AI workloads.

Email: ruofanw@umich.edu

Education

2024 - Present: University of Michigan (Umich), Ph.D. student in Computer Science & Engineering, Advisor: Prof. Mosharaf Chowdhury
2021 - 2024: Renmin University of China (RUC), M.E. in Computer Application Technology, Advisor: Prof. Feng Zhang
2017 - 2021: Renmin University of China (RUC), B.E. in Data Science and Big Data Technology

Experience

2023 - 2024: Microsoft, ~~DeepSpeed~~ Bing, Research Intern, Mentor: Dr. Zhen Zheng
2022 - 2023: Alibaba Cloud, Platform of Artificial Intelligence (PAI), Research Intern, Mentor: Dr. Zhen Zheng
2021: Microsoft Research Asia (MSRA), Systems Research Group, Research Intern, Mentor: Dr. Fan Yang, Dr. Jilong Xue
2019 - 2020: North Carolina State University (NCSU), PICTure Research Group, Remote Intern, Advisor: Prof. Xipeng Shen
2019: DELL EMC China Technology R&D Center, Intern

Selected Publications

The ML.ENERGY Benchmark: Toward Automated Inference Energy Measurement and Optimization paper illustration

Jae-Won Chung, Jeff J. Ma, Ruofan Wu, Jiachen Liu, Oh Jun Kweon, Yuxuan Xia, Zhiyu Wu, Mosharaf Chowdhury
The ML.ENERGY Benchmark: Toward Automated Inference Energy Measurement and Optimization

In NeurIPS Datasets and Benchmarks (Spotlight), 2025.

Main URL

TetriServe: Efficient DiT Serving for Heterogeneous Image Generation paper illustration

Runyu Lu*, Shiqi He*, Wenxuan Tan, Shenggui Li, Ruofan Wu, Jeff J. Ma, Ang Chen, Mosharaf Chowdhury
TetriServe: Efficient DiT Serving for Heterogeneous Image Generation

In Preprint, 2025.

Main URL

PluS: Highly Efficient and Expandable ML Compiler with Pluggable Graph Schedules paper illustration

Ruofan Wu, Zhen Zheng, Feng Zhang, Chuanjie Liu, Zaifeng Pan, Jidong Zhai, Xiaoyong Du
PluS: Highly Efficient and Expandable ML Compiler with Pluggable Graph Schedules

In USENIX ATC, 2025.

Main URL

RECom: A Compiler Approach to Accelerating Recommendation Model Inference with Massive Embedding Columns paper illustration

Zaifeng Pan, Zhen Zheng, Feng Zhang, Ruofan Wu, Hao Liang, Dalin Wang, Xiafei Qiu, Junjie Bai, Wei Lin, Xiaoyong Du
RECom: A Compiler Approach to Accelerating Recommendation Model Inference with Massive Embedding Columns

In ASPLOS, 2024.

Main URL

ROLLER: Fast and Efficient Tensor Compilation for Deep Learning paper illustration

Hongyu Zhu, Ruofan Wu, Yijia Diao, Shanbin Ke, Haoyu Li, Chen Zhang, Jilong Xue, Lingxiao Ma, Yuqing Xia, Wei Cui, Fan Yang, Mao Yang, Lidong Zhou, Asaf Cidon, Gennady Pekhimenko
ROLLER: Fast and Efficient Tensor Compilation for Deep Learning

In OSDI, 2022.

Main URL

DREW: Efficient Winograd CNN Inference with Deep Reuse paper illustration

Ruofan Wu, Feng Zhang, Jiawei Guan, Zhen Zheng, Xiaoyong Du, Xipeng Shen
DREW: Efficient Winograd CNN Inference with Deep Reuse

In WWW/TheWebConf, 2022.

Main URL

Cite The ML.ENERGY Benchmark: Toward Automated Inference Energy Measurement and Optimization

@inproceedings{mlenergy-neuripsdb25, title={The {ML.ENERGY Benchmark}: Toward Automated Inference Energy Measurement and Optimization}, author={Jae-Won Chung and Jeff J. Ma and Ruofan Wu and Jiachen Liu and Oh Jun Kweon and Yuxuan Xia and Zhiyu Wu and Mosharaf Chowdhury}, year={2025}, booktitle={NeurIPS Datasets and Benchmarks}, }

Cite TetriServe: Efficient DiT Serving for Heterogeneous Image Generation

@misc{lu2025tetriserveefficientditserving, title={TetriServe: Efficient DiT Serving for Heterogeneous Image Generation}, author={Runyu Lu and Shiqi He and Wenxuan Tan and Shenggui Li and Ruofan Wu and Jeff J. Ma and Ang Chen and Mosharaf Chowdhury}, year={2025}, eprint={2510.01565}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2510.01565}, }

Cite PluS: Highly Efficient and Expandable ML Compiler with Pluggable Graph Schedules

@inproceedings{10.5555/3768039.3768078, author = {Wu, Ruofan and Zheng, Zhen and Zhang, Feng and Liu, Chuanjie and Pan, Zaifeng and Zhai, Jidong and Du, Xiaoyong}, title = {PluS: highly efficient and expandable ML compiler with pluggable graph schedules}, year = {2025}, isbn = {978-1-939133-48-9}, publisher = {USENIX Association}, address = {USA}, abstract = {Machine learning (ML) compilers are effective solutions for deploying diverse Deep Neural Network (DNN) workloads on various hardware platforms automatically. However, there is a notable lag in existing ML compilers when it comes to supporting emerging optimization techniques like recent attention optimizations. These compilers lack the requisite flexibility to support expert-driven subgraph optimizations timely, resulting in suboptimal performance compared to manually optimized libraries. Conversely, template-based compilers lack the ability to abstractly express subgraphs, thereby reducing their adaptability to subtle changes in model architectures.In this paper, we present PluS, an end-to-end ML compiler that facilitates the deployment of expert-optimized subgraph implementations while still preserving compiler flexibility. We rethink the encapsulation of ML compiler and decouple the burdensome embedded graph transformation process. PluS provides a lightweight loop-centric subgraph abstraction for experts to manage a flexible pattern warehouse, and employs a pattern identification approach for subgraph generation. As a result, PluS can deploy efficient subgraph implementations with minimal manual efforts, making it outperform the state-of-the-art rule-based embedded compilers (up to 4.04\texttimes{} speedup) on popular ML models.}, booktitle = {Proceedings of the 2025 USENIX Conference on Usenix Annual Technical Conference}, articleno = {39}, numpages = {17}, location = {Boston, MA, USA}, series = {USENIX ATC '25} }