About me

I obtained my Ph.D. from the Computer Network Information Center (CNIC), Chinese Academy of Sciences, under the supervision of Professor Zhonghua Lu. I earned my Bachelor’s degree in Engineering Mechanics from Hohai University in 2019. I am currently working on inference optimization for large language models. During my Ph.D., my research focused on high-performance numerical linear algebra, with particular emphasis on heterogeneous computing, parallel and distributed algorithms, and sparse linear solvers for large-scale scientific and engineering problems. You can find my [Resume] here.

Github / Wechat / Zhihu

Selected Publications

  • [ICS’26] Xinxin Zhang, Haoyuan Zhang, Xiazhen Liu, Jialin Li, RunFeng Jin, Jian Zhang, Wu Yuan, Shan Liang, Zhonghua Lu. WindStencil: Unleashing GPU Potential for High-Order Stencil Computation in High-Performance Inviscid CFD Simulations. Paper

  • [EuroSys’26] Junlin Wei, Jinrong Jiang, Wu Wang, Chen Li, Yehong Zhang, Yue Yu, Lian Zhao, Xiang Han, Zhenjia Li, Feng Zhang, Haoyuan Zhang, Yidi Bai, Maoxue Yu, Kai Xu, Hailong Liu, Xuebin Chi. swKokkos: An Athread Backend for Enhanced Kokkos with the Sunway Heterogeneous Architecture. Paper

  • [ICCD’25] Haoyuan Zhang, Yaqian Gao, Xinxin Zhang, Jialin Li, Runfeng Jin, Yidong Chen, Feng Zhang, Wu Yuan, Wenpeng Ma, Shan Liang, Jian Zhang, Zhonghua Lu. FlashMP: Fast Discrete Transform-Based Solver for Preconditioning Maxwell’s Equations on GPUs. Paper

  • [SC’24] Yidong Chen, Chen Zhang, Rongchao Dong, Haoyuan Zhang, Yonghua Zhang, Zhonghua Lu, Jidong Zhai. MIXQ: Taming Dynamic Outliers in Mixed-Precision Quantization by Online Prediction. Paper

  • [ICPP’24] Runfeng Jin, Wenhao Liang, Haoyuan Zhang, Yinxuan Song, Zhen Luo, Haibo Ma, Yingjin Ma, Zhong Jin. PASCI: A Scalable Framework for Heterogeneous Parallel Calculation of Dynamical Electron Correlation. Paper

  • [ICPP’24] Jialin Li, Zhichen Feng, Yaqian Gao, Shaobo Tian, Haoyuan Zhang, Huang Ye, Jian Zhang. High-Performance 3D Convolution on the Latest Generation Sunway Processor. Paper

  • [ICCD’24] Haoyuan Zhang, Yidong Chen, Wenpeng Ma, Wu Yuan, Jian Zhang, Zhonghua Lu. MIST: Efficient Mixed-Precision Preconditioning Through Iterative Sparse-Triangular Solver Design. Paper

  • [CCF THPC’24] Haoyuan Zhang, WenPeng Ma, Wu Yuan, Jian Zhang, ZhongHua Lu. Mixed-precision block incomplete sparse approximate preconditioner on Tensor core. Paper

  • [Frontiers of Data & Computing’24] Haoyuan Zhang, Wenpeng Ma, Wu Yuan, Jian Zhang, Zhonghua Lu. Implementation of CCFD-KSSolver Component for GPU Architecture. Paper

Awards

  • First Prize in the Finals of the 11th Parallel Application Challenge (PAC), September 2024. Website

  • Third Prize in the Finals of the 7th China Parallel Application Challenge on Domestic CPU (CPC), September 2023. Website

  • Third Prize in the Finals at the ACM-China International Parallel Computing Challenge (IPCC) 2022, October 2022. Website

  • Third Prize in the Finals of the 3rd Priority Research Application (PRA), Quantum Chemistry track, December 2022. Website

Work Experience

  • June 2025 — September 2025   Meituan - Computing and Intelligent Platform Department - Model Compression and Acceleration Team