A Comparative Study on the Training Effects of Different Optimizers for Deep Learning Models

Peng Yin

doi:10.23977/jeis.2025.100217

A Comparative Study on the Training Effects of Different Optimizers for Deep Learning Models

Download as PDF

DOI: 10.23977/jeis.2025.100217 | Downloads: 18 | Views: 460

Author(s)

Peng Yin ¹

Affiliation(s)

¹ School of Electronic and Information Engineering, University of Science and Technology Liaoning, Anshan, China

Corresponding Author

Peng Yin

ABSTRACT

The training efficiency and generalization performance of deep learning models are highly dependent on the selection of optimizers. Differences in gradient update strategies among various optimizers directly affect the model's convergence speed, final accuracy, and training stability. Taking the house price prediction task as the research carrier, this paper constructs a fully connected neural network model based on the Boston Housing Dataset to systematically compare the training effects of three classic optimizers: Stochastic Gradient Descent (SGD), Adaptive Moment Estimation (Adam), and Root Mean Square Propagation (RMSprop). By controlling irrelevant variables such as model structure, learning rate, and batch size, quantitative analysis is conducted from three core dimensions: convergence speed, final prediction accuracy, and training stability. The applicable scenarios of each optimizer are discussed in combination with experimental results. Experiments show that the Adam optimizer has the fastest convergence speed and can quickly reduce the loss value in the early stage of training; the SGD optimizer, although converging slowly, can achieve the optimal final prediction accuracy after sufficient training; the RMSprop optimizer achieves a balance between convergence speed and stability, making it suitable for scenarios with non-stationary objective functions. The research results can provide practical references for optimizer selection in deep learning regression tasks, helping to improve the efficiency and performance of model training.

KEYWORDS

Deep Learning; Optimizer; House Price Prediction; Convergence Speed; Model Accuracy; Training Stability

CITE THIS PAPER

Peng Yin, A Comparative Study on the Training Effects of Different Optimizers for Deep Learning Models. Journal of Electronics and Information Science (2025) Vol. 10: 141-148. DOI: http://dx.doi.org/10.23977/10.23977/jeis.2025.100217.

REFERENCES

[1] Hinton, Geoffrey, Nitish Srivastava, and Kevin Swersky. "Neural networks for machine learning lecture 6a overview of mini-batch gradient descent." Cited on 14.8 (2012): 2.
[2] Sutskever, Ilya, et al. "On the importance of initialization and momentum in deep learning." International conference on machine learning, 2013.
[3] Duchi, John, Elad Hazan, and Yoram Singer. "Adaptive subgradient methods for online learning and stochastic optimization." Journal of machine learning research 12.7 (2011).
[4] Zeiler, Matthew D. "Adadelta: an adaptive learning rate method." arxiv preprint arxiv:1212.5701 (2012).
[5] Reddi, Sashank J., Satyen Kale, and Sanjiv Kumar. "On the convergence of adam and beyond." arxiv preprint arxiv:1904.09237 (2019).
[6] Wilson, Ashia C., et al. "The marginal value of adaptive gradient methods in machine learning." Advances in neural information processing systems 30 (2017).
[7] Abadi, Martín, et al. "Tensorflow: Large-scale machine learning on heterogeneous distributed systems." arxiv preprint arxiv:1603.04467 (2016).

Subscription

E-Mail Alert

Downloads:	14735
Visits:	640824

A Comparative Study on the Training Effects of Different Optimizers for Deep Learning Models

Author(s)

Affiliation(s)

Corresponding Author

ABSTRACT

KEYWORDS

CITE THIS PAPER

REFERENCES

RESOURCES

JOIN US

PUBLICATION SERVICES

CONTACT US