Towards systems level prognostics in the Cloud

  • ,
  • Mohak Shah ,
  • Scott Evans ,
  • Manoj Mehta ,
  • Anthony Gargulak ,
  • Tom Lasky

IEEE Conference on Prognostics and Health Management (PHM) |

Many application systems are transforming from device centric architectures to cloud based systems that leverage shared compute resources to reduce cost and maximize reach. These systems require new paradigms to assure availability and quality of service. In this paper, we discuss the challenges in assuring Availability and Quality of Service in a Cloud Based Application System. We propose machine learning techniques for monitoring systems logs to assess the health of the system. A web services data set is employed to show that variety of services can be clustered to different service classes using a k-means clustering scheme. Reliability, Availability, and Serviceability (RAS) logs and Job logs dataset from high performance computing system is employed to show that impending fatal errors in the system can be predicted from the logs using an SVM classifier. These approaches illustrate the feasibility of methods to monitor the systems health and performance of compute resources and hence can be used to manage these systems for high availability and quality of service for critical tasks such as health care monitoring in the cloud.