Education, Science, Technology, Innovation and Life
Open Access
Sign In

Progressive Sampling-Based Joint Automatic Model Selection of Machine Learning and Feature Selection

Download as PDF

DOI: 10.23977/jaip.2020.040104 | Downloads: 12 | Views: 961


Sufen Chen 1, Xueqiang Zeng 2


1 School of Information Engineering, Nanchang Institute of Technology, Nanchang, Jiangxi Province 330099, P.R. China
2 School of Computer & Information Engineering, Jiangxi Normal University, Nanchang, Jiangxi Province 330022, P.R. China

Corresponding Author

Xueqiang Zeng


In most machine learning applications, selecting an appropriate machine learning model requires advanced knowledge and many labor-intensive manual iterations. As a result, automatic machine learning is particularly important in order to lower the threshold for machine learning. In addition, feature selection is a very important data preprocessing process. Selecting important features can alleviate the dimension disaster problem, and removing irrelevant features can reduce the difficulty of learning tasks. The existing automatic selection methods cannot perform the automatic selection of machine learning model and feature selection model simultaneously on large-scale data. Therefore, in order to adapt to the rapid development of the era of big data, this paper proposes to establish a unified hyperparameter space for machine learning and feature selection, and adopt Bayesian optimization model based on progressive sampling for automatic model selection. By extensive experiments, we show that our approach can significantly reduce search time and classification error rates compared to the most advanced automated model selection methods.


Automatic selection, feature selection, Bayesian optimization, progressive sampling


Sufen Chen, Xueqiang Zeng. Progressive Sampling-Based Joint Automatic Model Selection of Machine Learning and Feature Selection. Journal of Artificial Intelligence Practice (2021) Vol. 4: 30-38. DOI:


[1] KOTTHOFF L, THORNTON C, HOOS H H, et al., Auto-WEKA: Automatic Model Selection and Hyperparameter Optimization in WEKA.[G]//Automated Machine Learning - Methods, Systems, Challenges. 2019: 81–95.
[2] PEDREGOSA F, VAROQUAUX G, GRAMFORT A, et al., Scikit-learn: Machine Learning in Python[J]. CoRR, 2012, abs/1201.0490.
[3] SCHAUL T, BAYER J, WIERSTRA D, et al., PyBrain.[J]. J. Mach. Learn. Res., 2010, 11: 743–746.
[4] JOVIC A, BRKIC K, BOGUNOVIC N. An overview of free software tools for general data mining.[C]//37th International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO 2014, Opatija, Croatia, May 26-30, 2014. 2014: 1112–1117.
[5] SHAHRIARI B, SWERSKY K, WANG Z, et al., Taking the Human Out of the Loop: A Review of Bayesian Optimization.[J]. Proc. IEEE, 2016, 104(1): 148–175.
[6] PROVOST F J, JENSEN D D, OATES T. Efficient Progressive Sampling.[C]//Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, August 15-18, 1999. 1999: 23–32.
[7] THORNTON C, HUTTER F, HOOS H H, et al., Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms.[C]//The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013, Chicago, IL, USA, August 11-14, 2013. 2013: 847–855.
[8] SNOEK J, LAROCHELLE H, ADAMS R P. Practical Bayesian Optimization of Machine Learning Algorithms.[C]//Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, United States. 2012: 2960–2968.
[9] KLEIN A, FALKNER S, BARTELS S, et al., Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets.[C]//Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS 2017, 20-22 April 2017, Fort Lauderdale, FL, USA. 2017: 528–536.
[10] HOFFMAN M W, SHAHRIARI B, FREITAS N de. On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning.[C]//Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, AISTATS 2014, Reykjavik, Iceland, April 22-25, 2014. 2014: 365–374.
[11] MEERA S, SUNDAR C. A hybrid metaheuristic approach for efficient feature selection methods in big data.[J]. J. Ambient Intell. Humaniz. Comput., 2021, 12(3): 3743–3751.
[12] XU Y, WANG X, ZHANG Y, et al., Summary of feature selection algorithms[J]. Control & Decision, 2012, 27(2): 161–313.
[13] ZAFFAR M, HASHMANI M A, SAVITA K S, et al., A review on feature selection methods for improving the performance of classification in educational data mining.[J]. Int. J. Inf. Technol. Manag., 2021, 20(1/2): 110–131.
[14] HE X, ZHAO K, CHU X. AutoML: A survey of the state-of-the-art.[J]. Knowl. Based Syst., 2021, 212: 106622.
[15] GRABOWSKI S, KOWALSKI T M. Algorithms for all-pairs Hamming distance based similarity.[J]. Softw. Pract. Exp., 2021, 51(7): 1580–1590.
[16] RAO R B, FUNG G. On the Dangers of Cross-Validation. An Experimental Evaluation.[C]//Proceedings of the SIAM International Conference on Data Mining, SDM 2008, April 24-26, 2008, Atlanta, Georgia, USA. 2008: 588–596.
[17] MARCOT B G, HANEA A M. What is an optimal value of k in k-fold cross-validation in discrete Bayesian network analysis?[J]. Comput. Stat., 2021, 36(3): 2009–2031.
[18] RAHMAN M A, ASYHARI A T, WEN O W, et al., Effective combining of feature selection techniques for machine learning-enabled IoT intrusion detection[J]. Multimedia Tools and Applications, 2021: 1–19.
[19] WITTEN I H, FRANK E, HALL M A. Data mining: practical machine learning tools and techniques, 3rd Edition.[M]. Morgan Kaufmann, Elsevier, 2011: 629.

Downloads: 7505
Visits: 198123

Sponsors, Associates, and Links

All published work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright © 2016 - 2031 Clausius Scientific Press Inc. All Rights Reserved.