- 자연언어처리용 전자사전을 위한 한국어 기본어휘 선정
- Selection of Korean General Vocabulary for Machine Readable Dictionaries
- ㆍ 저자명
- 배희숙,이주호,시정곤,최기선
- ㆍ 간행물명
- 언어와 정보
- ㆍ 권/호정보
- 2003년|7권 1호|pp.41-54 (14 pages)
- ㆍ 발행정보
- 한국언어정보학회
- ㆍ 파일정보
- 정기간행물| PDF텍스트
- ㆍ 주제분야
- 기타
According to Jeong Ho-seong (1999), Koreans use an average of only 20% of the 508,771 entries of the Korean standard unabridged dictionary. To establish MRD for natural language processing, it is necessary to select Korean lexical units that are used frequently and are considered as basic words. In this study, this selection process is done semi-automatically using the KAIST large corpus. Among about 220,000 morphemes extracted from the corpus of 40,000,000 eojeols, 50,637 morphemes (54,797 senses) are selected. In addition, the coverage of these morphemes in various texts is examined with two sub-corpora of different styles. The total coverage is 91.21 % in formal style and 93.24% in informal style. The coverage of 6,130 first degree morphemes is 73.64% and 81.45%, respectively.