Context-Aware Time Series Anomaly Detection for Complex Systems

Proc. of the SDM Workshop on Data Mining for Service and Maintenance |

Publication

Systems with several components interacting to accomplish challenging tasks are ubiquitous; examples include large server clusters providing “cloud computing”, manufacturing plants, automobiles, etc. Our relentless efforts to improve the capabilities of these systems inevitably increase their complexity as we add more components or introduce more dependencies between existing ones. To tackle this surge in distributed system complexity, system operators collect continuous monitoring data from various sources including hardware and software-based sensors. A large amount of this data is either in the form of time-series or contained in logs, e.g., operators’ activity, system event, and error logs, etc. In this paper, we propose a framework for mining system operational intelligence from massive amount of monitoring data that combines the time series data with information extracted from text-based logs. Our work is aimed at systems where logs capture the context of a system’s operations and the time-series data record the state of different components. This category includes a wide variety of systems including IT systems (compute clouds, web services’ infrastructure, enterprise computing infrastructure, etc.) and complex physical systems such as manufacturing plants. We instantiate our framework for Hadoop. Our preliminary results using both real and synthetic datasets show that the proposed context-aware approach is more effective for detecting anomalies compared to a time series only approach.