Home >> Events >> Content

[Seminar] On Efficient Training for Large-Scale Deep Learning Models

Source:     Publish Date:2023-10-10     Page Views:

Title:On Efficient Training for Large-Scale Deep Learning Models

SpeakerLi Shen

                  Research Scientist

                  JD Explore Academy

HostProfessor Zhouchen Lin

Date & Time2023/10/19  10:00 - 12:00

VooV Meeting:504 137 495


Abstract:   

The field of deep learning has witnessed significant developments in recent years. Specifically, the large-scale models trained on vast amounts of data hold immense promise for practical applications, enhancing industrial productivity. However, it suffers from the unstable training process, stringent requirements of computational resources, and underexplored convergence analysis, e.g., Adam, as one of the most influential adaptive stochastic algorithms for training deep neural networks, has been pointed out to be divergent even in the simple convex setting via a few simple counterexamples. In this talk, we systematically investigate the convergence theory and application of efficient training algorithms for pretraining large-scale deep learning models from the perspective of optimization. Specifically, (i) we derive the first easy-to-check sufficient condition, which merely depends on the parameters of the base learning rate and combinations of historical second-order moments, to guarantee the global convergence of the Adam optimizer for the non-convex stochastic setting. This observation, coupled with this sufficient condition, gives much deeper interpretations on the divergence of Adam. (ii) We theoretically show that distributed Adam can be linearly accelerated by using a larger number of nodes. (iii) We propose a communication-efficient variant of distributed Adam, dubbed Efficient-Adam, by adopting bi-directional compression and error-compensation techniques to reduce the communication cost and reduce compression bias, respectively. (iv) We develop FedLADA, a novel momentum-based federated optimizer via utilizing the global gradient descent and locally adaptive amended optimizer, to tackle the client drifts exacerbated by local over-fitting with the local adaptive optimizer in federated learning.


Biography:   

Li Shen is currently a research scientist at JD Explore Academy, Beijing, China. Previously, he was a senior researcher at Tencent AI Lab. He received his bachelor's degree and Ph.D. from the School of Mathematics, South China University of Technology. His research interests include theory and algorithms for nonsmooth convex and nonconvex optimization, and their applications in statistical machine learning, deep learning and reinforcement learning. He has published more than 60 papers in peer-reviewed top-tier journal papers (JMLR, IEEE TPAMI, IJCV, IEEE TSP, IEEE TIP, etc.) and conference papers (ICML, NeurIPS, ICLR, CVPR, ICCV, etc.). He has also served as the senior program committee for AAAI 2022, AAAI 2024 and area chair for ICPR 2022, ICPR 2024, ICLR 2024.