왜곡 제거 시계열 서브시퀀스 매칭에서 빠른 인덱스 구성법

서지반출

기타언어초록

본 논문에서는 왜곡 제거 시계열 서브시퀀스 매칭에서 인덱스를 효율적으로 구성하는 방법을 다룬다. 기존의 왜곡 제거 시계열 서브시퀀스 매칭에서는 인덱스 구축에 매우 많은 시간이 걸리는데, 이는 왜곡 제거의 모든 가능한 경우를 고려하기 위해 너무 많은 윈도우가 생성되기 때문이다. 실제로 길이 30 만의 시계열인 경우에도 인덱스 구축을 위해 약 100분의 많은 시간이 걸려, 대용량 시계열 데이터에 대해서는 인덱스 구축이 매우 어려운 단점이 있다. 본 논문에서는 기존 인덱스 구축 과정을 단계별로 정형적으로 분석한 후, 각 단계별 성능 극대화 방법을 제안한다. 특히, 동적 프로그래밍 기법을 이용하여 PAA-버킷 및 DF-버킷(distortion-free bucket)의 개념을 제안하는데, 이는 반복되는 계산 결과를 저장-후-재사용(store-and-reuse)하는 기법으로, 이를 사용하여 기존 방법에 비해 효율적인 인덱스 구축이 가능하다. 본 논문에서는 복잡도 분석 및 실험 평가를 통해 제안한 방법의 우수성을 입증한다.

기타언어초록

In this paper we present an efficient approach of constructing a multidimensional index in distortion-free time-series subsequence matching. Index construction of previous distortion-free subsequence matching algorithms is a very time-consuming process since it generates a huge number of windows to consider all possible positions and all possible query lengths. According to the real experiment, the index construction time reaches approximately 100 minutes for a time-series of length 300K, and this means that the index construction itself is very difficult for very large time-series databases. To solve this problem, in this paper we first thoroughly analyze the index construction steps, then discuss how to improve the performance of each step, and finally propose two advanced algorithms of efficiently constructing an index. In particular, by exploiting dynamic programming techniques, we present the concepts of PAA-bucket and DF(distortion-free)-bucket, which store-and-reuse the intermediate results repeatedly computed. Through the store-and-reuse technique, the proposed algorithms construct a multidimensional index very faster than the previous algorithm. Through analytical and empirical evaluations, we showcased the superiority of the proposed algorithms.

키워드

시계열 데이터베이스 데이터 마이닝 왜곡 제거 시계열 인덱스 구성법 유사 시퀀스 매칭 Time-series databases data mining distortion-free time-series index construction similar sequence matching

다운URL