Tutorial: Outlier Detection for Temporal Data
- Manish Gupta ,
- Jing Gao ,
- Charu Aggarwal ,
- Jiawei Han
Proc. of the ACM Intl. Conf. on Information and Knowledge Management (CIKM) |
Outlier (or anomaly) detection is a very broad field which has been studied in the context of a large number of research areas like statistics, data mining, sensor networks, environmental science, distributed systems, spatio-temporal mining, etc. The first few articles in outlier detection focused on time series based outliers (in statistics). Since then, outlier detection has been studied on a large variety of data types including high-dimensional data, uncertain data, stream data, network data, time series data, spatial data, and spatiotemporal data. While there have been many tutorials and surveys for general outlier detection, we focus on outlier detection for temporal data in this tutorial.
A large number of applications generate temporal datasets. For example, in our everyday life, various kinds of records like credit, personnel, financial, judicial, medical, etc. are all temporal. This stresses the need for an organized and detailed study of outliers with respect to such temporal data. In the past decade, there has been a lot of research on various forms of temporal data including consecutive data snapshots, series of data snapshots and data streams. Besides the initial work on time series, researchers have focused on rich forms of data including multiple data streams, spatio-temporal data, network data, community distribution data, etc. Compared to general outlier detection, techniques for temporal outlier detection are very different, like AR models, Markov models, evolutionary clustering, etc.
In this tutorial, we will present an organized picture of recent research in temporal outlier detection. We begin by motivating the importance of temporal outlier detection and briefing the challenges beyond usual outlier detection. Then, we list down a taxonomy of proposed techniques for temporal outlier detection. Such techniques broadly include statistical techniques (like AR models, Markov models, histograms, neural networks), distance and density based approaches, grouping based approaches (clustering, community detection), network based approaches, and spatio-temporal outlier detection approaches. We summarize by presenting a collection of applications where temporal outlier detection techniques have been applied to discover interesting outliers.