Publications
- “RECom: A Compiler Approach to Accelerating Recommendation Model Inference with Massive Embedding Columns”, Zaifeng Pan, Zhen Zheng, Feng Zhang, Ruofan Wu, Hao Liang, Dalin Wang, Xiafei Qiu, Junjie Bai, Wei Lin, Xiaoyong Du, in Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2024).
- “Expanding the Edge: Enabling Efficient Winograd CNN Inference with Deep Reuse on Edge Device”, Feng Zhang, Ruofan Wu, Jiawei Guan, Zhen Zheng, Xiaoguang Guo, Xiao Zhang, Xiaoyong Du, Xipeng Shen, in IEEE Transactions on Knowledge and Data Engineering (TKDE 2023).
- “ROLLER: Fast and Efficient Tensor Compilation for Deep Learning”, Hongyu Zhu, Ruofan Wu, Yijia Diao, Shanbin Ke, Haoyu Li, Chen Zhang, Jilong Xue, Lingxiao Ma, Yuqing Xia, Wei Cui, Fan Yang, Mao Yang, Lidong Zhou, Asaf Cidon, Gennady Pekhimenko, in USENIX Symposium on Operating Systems Design and Implementation (OSDI 2022).
- “TREC: Transient Redundancy Elimination-based Convolution”, Jiawei Guan, Feng Zhang, Jiesong Liu, Hsin-Hsuan Sung, Ruofan Wu, Xiaoyong Du, Xipeng Shen, in Advances in Neural Information Processing Systems 35 (NeurIPS 2022).
- “DREW: Efficient Winograd CNN Inference with Deep Reuse”, Ruofan Wu, Feng Zhang, Jia Wei Guan, Zhen Zheng, Xiaoyong Du, Xipeng Shen, in Proceedings of the ACM Web Conference 2022 (TheWebConf/WWW 2022).
- “YuenyeungSpTRSV: A Thread-Level and Warp-Level Fusion Synchronization-Free Sparse Triangular Solve”, Jiya Su, Feng Zhang, Weifeng Liu, Bingsheng He, Ruofan Wu, Xiaoyong Du, Rujia Wang, in IEEE Transactions on Parallel and Distributed Systems (TPDS 2021).
- “CapelliniSpTRSV: A Thread-Level Synchronization-Free Sparse Triangular Solve on GPUs”, Jiya Su, Feng Zhang, Weifeng Liu, Bingsheng He, Ruofan Wu, Xiaoyong Du, Rujia Wang, in Proceedings of the 49th International Conference on Parallel Processing (ICPP 2020).