Attention 기반 문장 집계 모델을 통한 논술형 에세이 자동채점

안지영; 류지훈

서지반출

국문초록

본 연구는 한국어 논술형 에세이 자동채점(AES)에서 문장 단위 임베딩을 문서 수준으로 집계하는 방식에 따라 예측 성능과 문장 정보 활용 양상이 어떻게 달라지는지를 분석하는 것을 목적으로 하였다. 이를 위하여 문장 임베딩을 비학습적으로 통합하는 단순 풀링 기반 집계 방식(Mean, Max, Min Pooling)과 문장 단위 중요도를 학습적으로 산출하는 Attention 기반 집계 방식(Gated MIL Attention, TransMIL Attention)을 적용하여 성능을 비교하였다. 분석 결과, 단순 풀링 방식은 일정수준의 예측 성능을 보였으나 Attention 기반 집계 방식과 비교할 때 성능 지표 전반에서 뚜렷한 우위를 보이지는 않았다. 반면 Gated MIL Attention과 TransMIL Attention은 일부 성능 지표에서 상대적으로 우수한 결과를 나타냈으며, 문장 단위 정보가 동일하게 처리되기보다 가중화되거나 문장 간 관계적 맥락 속에서 활용될 수 있음을 시사하였다. 특히 Attention 기반 집계 구조를 분석한 결과, 상대적으로 높은 가중치를 부여받은 문장들이 주장 및 근거 제시와 같은 논설문 핵심 기능을 수행하는 문장들로 나타나, 문장 중요도 정보가 점수 예측뿐 아니라 에세이 구조 해석에도 활용될 수 있음을 확인하였다. 다만 본 연구는 문장 간 Attention 관계를 탐색적으로 분석한 것으로, 개별 문장의 의미적 기능이나 평가 기준과의 직접적인 대응 관계를 정량적으로 규명하는 데에는 한계가 있다. 또한 채점자 식별 정보의 부재로 인해 채점자 간 일치도 산출이나 채점자별 편차를 고려한 분석은 수행하지 못하였다. 그럼에도 불구하고 본 연구는 문장 단위 정보를 고려하는 Attention 기반 집계 구조가 AES에서 구조적으로 의미 있는 접근임을 실증적으로 제시하였다는 점에서 의의를 갖는다. 향후 연구에서는 채점자 정보가 포함된 데이터셋과 다양한 논술 유형을 활용하여 Attention 기반 집계 구조의 타당성과 일반화 가능성을 추가로 검증할 필요가 있다.

영문초록

This study examines how prediction performance and sentence information utilization differ according to methods for aggregating sentence-level embeddings to the document level in Korean essay assessment using Automated Essay Scoring (AES). Non-trainable aggregation methods (such as Mean, Max, and Min Pooling) were compared with attention-based aggregation methods that learn sentence-level importance (such as Gated MIL Attention and TransMIL Attention). The results show that simple pooling methods achieved a certain level of performance, but they did not demonstrate consistent advantages across evaluation metrics compared with attention-based approaches. In contrast, Gated MIL Attention and TransMIL Attention performed relatively better on several metrics, indicating that sentence-level information can be more effectively utilized when sentences are differentially weighted or modeled within inter-sentence relational contexts. Result of attention-based aggregation further revealed that sentences assigned higher importance weights tended to perform core argumentative functions, such as presenting claims, providing evidence, and emphasizing key points. These findings suggest that attention-based sentence aggregation is not only beneficial for score prediction but also informative and explainable for interpreting essay structure. Despite limitations related to exploratory analysis and the absence of rater identification data, this study empirically demonstrates that attention-based aggregation constitutes a structurally meaningful approach in AES.

키워드

자동채점(AES)Attention TransMIL Gated MIL 논술형 평가

구매하기 (5,300)

장바구니

국문초록

영문초록

목차

키워드