HPC Analytics
Project Description
HPC (High Performance Computing) analytics project is focusing on Machine Learning techniques in the domain of HPC resource allocation. This project is based on the Beocat log file which is an HPC cluster at Kansas State University, and using Machine Learning techniques, such as supervised learning and reinforcement learning, to aim three purposes:
- Predicting the sufficiency of resources requested in the HPC system during job submission time
- Making HPC resource allocation more efficient
- Building decision support for HPC users
Keywords
artificial intelligence
, machine learning
, predictive analytics
, user modeling
, decision support
, HPC
,reinforcement learning
Methods
Methods include predictive modeling (e.g., linear regression, ridge regression), model classification (e.g., logistic regression, Gaussian Naive Bayes classification), user modeling for cluster analysis and reinforcement learning for building decision support systems.
Current Team Members
- Huichen Yang
- Adedolapo Ridwan Okanlawon
- Avishek Bose
Affiliates
- Jared Marolf
- Scott Hutchison
- Dr. Mohammed Tanash
- Dr. Daniel Andresen
Alumni
- Luis Enrique Bobadilla
- Richard Carmona
Data Sets
Source Code
References
Background and Related Work
KDD Lab Publications
Bose, A., Yang, H., Hsu, W. H., & Andresen, D. (2021). HPCGCN: A Predictive Framework on High Performance Computing Cluster Log Data Using Graph Convolutional Networks. In Proceedings of the 8th International Workshop on High Performance Big Graph Data Management, Analysis, and Mining (BigGraph 2021), held in conjunction with the IEEE International Conference on Big Data 2021 (IEEE BigData 2021), virtual conference, December 15-18, 2021, to appear.
Tanash, M., Andresen, D., & Hsu, W. (2021). AMPRO-HPCC: A Machine-Learning Tool for Predicting Resources on Slurm HPC Clusters. In Proceedings of the 15th International Conference on Advanced Engineering Computing and Applications in Sciences (ADVCOMP 2021), Barcelona, Spain, October 3 - 7, 2021.
Tanash, M., Dunn, B., Andresen, D., Hsu, W., Yang, H., Okanlawon, A. (2019). Improving HPC System Performance by Predicting Job Resources via Supervised Machine Learning. Proceedings of the 3rd Conference on Practice and Experience in Advanced Research Computing (PEARC 2019), Chicago, IL, USA, July 28 - August 1, 2019.
Andresen, D., Hsu, W., Yang, H., & Okanlawon, A. (2018). Machine Learning for Predictive Analytics of Compute Cluster Jobs. Proceedings of the 16th International Conference on Scientific Computing (CSC' 2018), Las Vegas, Nevada, USA, July 30 - August 2, 2018.
Last updated by pozegov on Jul 6, 2023