Subspace Optimization for Large Language Models with Convergence Guarantees
2024-12-26   浏览次数:10

报告题目:Subspace Optimization for Large Language Models with Convergence Guarantees

报告人袁坤   助理教授北京大学

主持人:王祥丰   教授

报告时间:2024年12月27日(星期五)10:30-11:30

报告地点:华东师范大学普陀校区理科大楼B112室


报告摘要:

Subspace optimization algorithms, with GaLore (Zhao et al., 2024) as a representative method, have gained popularity for pre-training or fine-tuning large language models (LLMs) due to their memory efficiency. However, their convergence guarantees remain unclear, particularly in stochastic settings. In this paper, we unexpectedly discover that GaLore does not always converge to the optimal solution and substantiate this finding with an explicit counterexample. We then investigate the conditions under which GaLore can achieve convergence, demonstrating that it does so either in deterministic scenarios or when using a sufficiently large mini-batch size. More significantly, we introduce GoLore (Gradient random Low-rank projection), a novel variant of GaLore that provably converges in stochastic settings, even with standard batch sizes. Our convergence analysis can be readily extended to other sparse subspace optimization algorithms. Finally, we conduct numerical experiments to validate our theoretical results and empirically explore the proposed mechanisms.


报告人简介:

Dr. Kun Yuan is an Assistant Professor at Center for Machine Learning Research (CMLR) in Peking University. He completed his Ph.D. degree at UCLA in 2019, and was a staff algorithm engineer in Alibaba (US) Group between 2019 and 2022. His research focuses on the development of fast, scalable, reliable, and distributed algorithms with applications in large-scale optimization, deep neural network training, federated learning, and Internet of Things. He was the recipient of the 2017 IEEE Signal Processing Society Young Author Best Paper Award, and the 2017 ICCM Distinguished Paper Award.



中山北路3663号理科大楼 200062

沪ICP备05003394


Copyright 2019计算机科学与技术学院