Accelerating deep neural network training with inconsistent stochastic gradient descent

Wang Linnan<sup>*</sup>; Yang Yi; Min Renqiang; Chakradhar Srimat

doi:10.1016/j.neunet.2017.06.003

摘要

Stochastic Gradient Descent (SGD) updates Convolutional Neural Network (CNN) with a noisy gradient computed from a random batch, and each batch evenly updates the network once in an epoch. This model applies the same training effort to each batch, but it overlooks the fact that the gradient variance, induced by Sampling Bias and Intrinsic Image Difference, renders different training dynamics on batches. In this paper, we develop a new training strategy for SGD, referred to as Inconsistent Stochastic Gradient Descent (ISGD) to address this problem. The core concept of ISGD is the inconsistent training, which dynamically adjusts the training effort w.r.t the loss. ISGD models the training as a stochastic process that gradually reduces down the mean of batch's loss, and it utilizes a dynamic upper control limit to identify a large loss batch on the fly. ISGD stays on the identified batch to accelerate the training with additional gradient updates, and it also has a constraint to penalize drastic parameter changes. ISGD is straightforward, computationally efficient and without requiring auxiliary memories. A series of empirical evaluations on real world datasets and networks demonstrate the promising performance of inconsistent training.

出版日期2017-9

全文

访问全文

收藏分享被引(71) 浏览

更新时间：2024-04-13 02:34

Accelerating deep neural network training with inconsistent stochastic gradient descent

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友