앵커 기반 파일 유사도 예측 기법을 사용한 중복제거 시스템 설계 및 구현

앵커 기반 파일 유사도 예측 기법을 사용한 중복제거 시스템 설계 및 구현

ㆍ 저자명: 정호민,고영웅,Jung. Ho Min,Ko. Young Woong
ㆍ 간행물명: 정보과학회논문지. Journal of KIISE. 시스템 및 이론
ㆍ 권/호정보: 2012년|39권 5호|pp.298-305 (8 pages)
ㆍ 발행정보: 한국정보과학회
ㆍ 파일정보: 정기간행물|
PDF텍스트
ㆍ 주제분야: 기타

이 논문은 한국과학기술정보연구원과 논문 연계를 통해 무료로 제공되는 원문입니다.

서지반출

기타언어초록

기존의 데이터 중복 제거 시스템에서는 중복 제거 서버에 존재하는 유사한 파일을 발견하기 위하여 파일 유사도 예측 기법을 사용하며, 중복된 데이터 블록이 서버로 전송되는 것을 최소화할 수 있다. 따라서 중복 제거 시스템의 성능을 증가시키기 위하여 파일 유사도 계산에 소요되는 시간을 줄이는 것은 매우 중요하다. 본 논문에서는 낮은 오버헤드로 파일 유사도를 예측할 수 있는 앵커 기반의 해시 알고리즘 기법을 제안한다. 주요 아이디어는 파일의 대표 해시 리스트를 계산할 때에 샘플링 방식을 적용하는 것이다. 실험 결과 앵커 기반의 파일 유사도 예측 기법의 정확도는 기존의 방식과 비슷하였으나, 수행 시간에 있어서는 앵커 기반 방식이 기존의 접근 방식에 비해서 4배 정도 빠른 것으로 나타났다.

기타언어초록

Traditional data deduplication system adapts a file similarity prediction scheme to find a similar file in the deduplication server. Therefore, we can minimize transferring duplicated data blocks to the server. To improve overall performance of deduplication system, it is important to reduce the execution time of file similarity computation. In this paper, we propose a low overhead file similarity prediction scheme using anchor-based hashing algorithm. The key idea is to apply sampling approach when we compute the representative hash list for a file. Experiment result shows that the accuracy of anchor-based file similarity prediction is comparable to traditional approach. However, the execution time of anchor-based scheme is 4 times faster than traditional file similarity prediction scheme.

키워드

중복 제거 앵커 청킹 파일 유사도 deduplication anchor chunking file similarity

다운URL