K-Means 클러스터링 성능 향상을 위한 최대평균거리 기반 초기값 설정

K-Means 클러스터링 성능 향상을 위한 최대평균거리 기반 초기값 설정

ㆍ 저자명: 이신원,이원휘,Lee. Shin-Won,Lee. Won-Hee
ㆍ 간행물명: 인터넷정보학회논문지
ㆍ 권/호정보: 2011년|12권 2호|pp.103-111 (9 pages)
ㆍ 발행정보: 한국인터넷정보학회
ㆍ 파일정보: 정기간행물|
PDF텍스트
ㆍ 주제분야: 기타

이 논문은 한국과학기술정보연구원과 논문 연계를 통해 무료로 제공되는 원문입니다.

서지반출

기타언어초록

대규모 데이터에 대한 특성에 따라 몇 개의 클러스터로 군집화하는 클러스터링 기법은 계층적 클러스터링이나 분할 클러스터링 등 다양한 기법이 있는데 그 중에서 K-Means 알고리즘은 구현이 쉬우나 할당-재계산에 소요되는 시간이 증가하게 된다. 본 논문에서는 초기 클러스터 중심들 간의 거리가 최대가 되도록 하여 초기 클러스터 중심들이 고르게 분포되도록 함으로써 할당-재계산 횟수를 줄이고 전체 클러스터링 시간을 감소시키고자 한다.

기타언어초록

Clustering methods is divided into hierarchical clustering, partitioning clustering, and more. If the amount of documents is huge, it takes too much time to cluster them in hierarchical clustering. In this paper we deal with K-Means algorithm that is one of partitioning clustering and is adequate to cluster so many documents rapidly and easily. We propose the new method of selecting initial seeds in K-Means algorithm. In this method, the initial seeds have been selected that are positioned as far away from each other as possible.

키워드

클러스터링 초기값 clustering K-Means initial seed

다운URL