MapReduce 작업처리시간 단축을 위한 선 정렬 기반 태스크 스케줄링 기법

MapReduce 작업처리시간 단축을 위한 선 정렬 기반 태스크 스케줄링 기법

ㆍ 저자명: 박정효,김준상,김창현,이원주,전창호,Park. Jung Hyo,Kim. Jun Sang,Kim. Chang Hyeon,Lee. Won Joo,Jeon. Chang Ho
ㆍ 간행물명: 韓國컴퓨터情報學會論文誌
ㆍ 권/호정보: 2013년|18권 11호|pp.23-30 (8 pages)
ㆍ 발행정보: 한국컴퓨터정보학회
ㆍ 파일정보: 정기간행물|
PDF텍스트
ㆍ 주제분야: 기타

이 논문은 한국과학기술정보연구원과 논문 연계를 통해 무료로 제공되는 원문입니다.

서지반출

기타언어초록

본 논문에서는 MapReduce 작업처리시간을 줄일 수 있는 선 정렬 기반 태스크 스케줄링 기법을 제안한다. 태스크와 그 태스크가 처리할 데이터가 동일 노드에 존재하지 않으면 해당 태스크는 다른 노드로부터 데이터를 전송받아 처리한다. 이때 전송시간으로 인해 MapReduce의 작업처리시간이 증가하는 문제점이 발생한다. 이러한 문제점을 해결하기 위해 본 논문에서는 두 단계로 태스크를 스케줄링한다. 첫 번째 단계에서는 데이터 지역성이 높은 순으로 태스크를 노드 리스트에 정렬한다. 두 번째 단계에서는 데이터의 위치정보를 이용하여 태스크들이 데이터 지역성을 높일 수 있도록 교환하여 스케줄링한다. 본 논문에서는 제안한 스케줄링 기법의 성능평가를 위해 소규모 Hadoop 클러스터를 구현하여 실험하였다. 제안한 기법을 적용하였을 때 작업처리시간이 약 18% 감소하였으며 데이터가 저장된 노드에 할당되지 않은 태스크 수는 약 25% 감소하였다.

기타언어초록

In this paper, we propose pre-arrangement based task scheduling scheme to reduce MapReduce job processing time. If a task and data to be processed do not locate in same node, the data should be transmitted to node where the task is allocated on. In that case, a job processing time increases owing to data transmission time. To avoid that case, we schedule tasks into two steps. In the first step, tasks are sorted in the order of high data locality. In the second step, tasks are exchanged to improve their data localities based on a location information of data. In performance evaluation, we compare the proposed method based Hadoop with a default Hadoop on a small Hadoop cluster in term of the job processing time and the number of tasks sorted to node without data to be processed by them. The result shows that the proposed method lowers job processing time by around 18%. Also, we confirm that the number of tasks allocated to node without data to be processed by them decreases by around 25%.

키워드

데이터 지역성 Hadoop MapReduce Data Locality

다운URL