Wiki Contents

HPC Analytics


Project Description

HPC (High Performance Computing) analytics project is focusing on Machine Learning techniques in the domain of HPC resource allocation. This project is based on the Beocat log file which is an HPC cluster at Kansas State University, and using Machine Learning techniques, such as supervised learning and reinforcement learning, to aim three purposes:

  1. Predicting the sufficiency of resources requested in the HPC system during job submission time
  2. Making HPC resource allocation more efficient
  3. Building decision support for HPC users

Keywords

artificial intelligence, machine learning, predictive analytics, user modeling, decision support, HPCreinforcement learning

Methods

Methods include predictive modeling (e.g., linear regression, ridge regression), model classification (e.g., logistic regression, Gaussian Naive Bayes classification), user modeling for cluster analysis and reinforcement learning for building decision support systems.

Current Team Members

  • Huichen Yang
  • Adedolapo Ridwan Okanlawon
  • Avishek Bose

Affiliates

  • Jared Marolf
  • Scott Hutchison
  • Dr. Mohammed Tanash
  • Dr. Daniel Andresen

Alumni

  • Luis Enrique Bobadilla
  • Richard Carmona

Data Sets

Source Code

References

KDD Lab Publications

  • Bose, A., Yang, H., Hsu, W. H., & Andresen, D. (2021). HPCGCN: A Predictive Framework on High Performance Computing Cluster Log Data Using Graph Convolutional Networks. In Proceedings of the 8th International Workshop on High Performance Big Graph Data Management, Analysis, and Mining (BigGraph 2021), held in conjunction with the IEEE International Conference on Big Data 2021 (IEEE BigData 2021), virtual conference, December 15-18, 2021, to appear.

  • Tanash, M., Andresen, D., & Hsu, W. (2021). AMPRO-HPCC: A Machine-Learning Tool for Predicting Resources on Slurm HPC Clusters. In Proceedings of the 15th International Conference on Advanced Engineering Computing and Applications in Sciences (ADVCOMP 2021), Barcelona, Spain, October 3 - 7, 2021.

  • Tanash, M., Dunn, B., Andresen, D., Hsu, W., Yang, H., Okanlawon, A. (2019). Improving HPC System Performance by Predicting Job Resources via Supervised Machine Learning. Proceedings of the 3rd Conference on Practice and Experience in Advanced Research Computing (PEARC 2019), Chicago, IL, USA, July 28 - August 1, 2019.

  • Andresen, D., Hsu, W., Yang, H., & Okanlawon, A. (2018). Machine Learning for Predictive Analytics of Compute Cluster Jobs. Proceedings of the 16th International Conference on Scientific Computing (CSC' 2018), Las Vegas, Nevada, USA, July 30 - August 2, 2018.

Last updated by pozegov on Jul 6, 2023