Subspace Optimization for Large Language Models with Convergence Guarantees

2024-12-26 浏览次数：10

报告题目：Subspace Optimization for Large Language Models with Convergence Guarantees

报告人：袁坤助理教授（北京大学）

主持人：王祥丰教授

报告时间：2024年12月27日（星期五）10:30-11:30

报告地点：华东师范大学普陀校区理科大楼B112室

报告摘要：

Subspace optimization algorithms, with GaLore (Zhao et al., 2024) as a representative method, have gained popularity for pre-training or fine-tuning large language models (LLMs) due to their memory efficiency. However, their convergence guarantees remain unclear, particularly in stochastic settings. In this paper, we unexpectedly discover that GaLore does not always converge to the optimal solution and substantiate this finding with an explicit counterexample. We then investigate the conditions under which GaLore can achieve convergence, demonstrating that it does so either in deterministic scenarios or when using a sufficiently large mini-batch size. More significantly, we introduce GoLore (Gradient random Low-rank projection), a novel variant of GaLore that provably converges in stochastic settings, even with standard batch sizes. Our convergence analysis can be readily extended to other sparse subspace optimization algorithms. Finally, we conduct numerical experiments to validate our theoretical results and empirically explore the proposed mechanisms.

报告人简介：

Dr. Kun Yuan is an Assistant Professor at Center for Machine Learning Research (CMLR) in Peking University. He completed his Ph.D. degree at UCLA in 2019, and was a staff algorithm engineer in Alibaba (US) Group between 2019 and 2022. His research focuses on the development of fast, scalable, reliable, and distributed algorithms with applications in large-scale optimization, deep neural network training, federated learning, and Internet of Things. He was the recipient of the 2017 IEEE Signal Processing Society Young Author Best Paper Award, and the 2017 ICCM Distinguished Paper Award.

微信图片_20241226171407.png