Document Type
Article
Publication Date
2002
Abstract
Real-time production systems and other dynamic environments often generate tremendous (potentially infinite) amount of stream data; the volume of data is too huge to be stored on disks or scanned multiple times. Can we perform on-line, multi-dimensional analysis and data mining of such data to alert people about dramatic changes of situations and to initiate timely, high-quality responses? This is a challenging task.
In this paper, we investigate methods for online, multi-dimensional regression analysis of time-series stream data, with the following contributions: (1) our analysis shows that only a small number of compressed regression measures instead of the complete stream of data need to be registered for multi-dimensional linear regression analysis, (2) to facilitate on-line stream data analysis, a partially materialized data cube model, with regression as measure, and a tilt time frame as its time dimension, is proposed to minimize the amount of data to be retained in memory or stored on disks, and (3) an exception-guided drilling approach is developed for online, multi-dimensional exception-based regression analysis. Based on this design, algorithms are proposed for efficient analysis of time-series data streams. Our performance study compares the proposed algorithms and identifies the most memory- and time- efficient one for multi-dimensional stream data analysis.
Repository Citation
Chen, Y.,
Dong, G.,
Han, J.,
Wah, B. W.,
& Wang, J.
(2002). Multi-Dimensional Regression Analysis of Time-Series Data Streams. Very Large Data Bases 2002: Proceedings of the Twenty-Eighth International Conference on Very Large Data Bases, Hong Kong SAR, China, 20-23 August 2002.
https://corescholar.libraries.wright.edu/knoesis/334
Included in
Bioinformatics Commons, Communication Technology and New Media Commons, Databases and Information Systems Commons, OS and Networks Commons, Science and Technology Studies Commons
Comments
Presented at the 28th International Conference on Very Large Data Bases, Hong Kong, China, August 20-23, 2002.