효율적인 브로드캐스트 통신을 지원하는 MPI 하드웨어 유닛 설계

효율적인 브로드캐스트 통신을 지원하는 MPI 하드웨어 유닛 설계

ㆍ 저자명: 윤희준,정원영,이용석,Yun. Hee-Jun,Chung. Won-Young,Lee. Yong-Surk
ㆍ 간행물명: 한국통신학회논문지. The Journal of Korea Information and Communications Society. 네트워크 및 서비스
ㆍ 권/호정보: 2011년|36권 |pp.1329-1338 (10 pages)
ㆍ 발행정보: 한국통신학회
ㆍ 파일정보: 정기간행물|
PDF텍스트
ㆍ 주제분야: 기타

이 논문은 한국과학기술정보연구원과 논문 연계를 통해 무료로 제공되는 원문입니다.

서지반출

기타언어초록

본 논문에서는 분산 메모리 아키텍처를 사용하는 멀티프로세서에서 가장 병목 현상이 심한 집합통신 중 브로드캐스트를 위한 알고리즘 및 하드웨어 구조를 제안한다. 기존 시스템에서 집합통신은 프로세싱 노드의 통신포트 상태가 busy 혹은 free 인지를 고려하지 않고 MPI libray cell 에 의해서 점대점 통신으로 변환되어 진다. 만약 브로드캐스트 통신을 하는 동안에 간섭하는 점대점 통신이 있다면, 브로드캐스트 통신의 전송 속도는 저하된다. 따라서 본 논문에서는 각각의 프로세싱 노드의 상태를 고려하여 통신 순서를 결정하는 브로드캐스트 통신 알고리즘을 제안하였다. 제안하는 구조의 알고리즘은 각 프로세싱 노드의 상태에 따라, free 상태의 통신 포트를 가진 프로세싱 노드의 통신 포트에게 우선적으로 메시지를 송신하여 전체적인 집합통신 시간을 단축하였다. 본 연구에서 제안하는 브로드캐스트 통신을 위한 MPI 유닛은 SystemC로 모델링하여 평가하였다. 또한 본 구조는 16노드에서 브로드캐스트 통신의 성능을 최대 78% 향상시켰고, 이는 MPSoC(Multi-Processor System-on-Chip)의 전체적인 성능을 높이는데 유용하다.

기타언어초록

This paper proposes an algorithm and hardware architecture for a broadcast communication which has the worst bottleneck among multiprocessor using distributed memory architectures. In conventional systems, collective communication is converted into point-to-point communications by MPI library cell without considering the state of communication port of each processing node which represents the processing node is in busy state or free state. If conflicting point-to-point communication occurs during broadcast communication, the transmitting speed for broadcast communication is decreased. Thus, this paper proposed an algorithm which determines the order of point-to-point communications for broadcast communication according to the state of each processing node. According to the state of each processing node, the proposed algorithm decreases total broadcast communication time by transmitting message preferentially to the processing node with communication port in free state. The proposed MPI unit for broadcast communication is evaluated by modeling it with systemC. In addition, it achieved a highly improved performance for broadcast communication up to 78% with 16 nodes. This result shows the proposed algorithm is useful to improving total performance of MPSoC.

키워드

MPSoC Message passing Distributed memory Broadcast Collective operation

다운URL