다분문항반응이론에서 두 가지 사후기대추정법을 통한 IRT 능력모수 추정의 정확성 비교 연구

강태훈; 심혜진

서지반출

국문초록

본 연구에서는 다분문항으로 이루어진 검사를 실시한 후 피험자의 능력을 추정하기 위한 방법 중에서, 검사 총점을 고려한 사후기대추정법(EAPss)이 문항별 반응양식을 고려한 사후기대추정법(EAPrp)에 비하여 상대적으로 어떠한 기능을 보이는가를 모의실험을 통하여 살펴보고자 하였다. 후자는 검사자료의 더 많은 정보를 활용하기 때문에 보다 정확한 능력 추정치를 제공하지만, 전자의 경우 원점수와 일대일로 대응하는 속성으로 인하여 일반 대중의 이해가 용이하다는 장점을 갖는다. 이 연구에서는, 이분문항 검사 자료에 대한 선행연구의 결과와 같이 만약 다분문항 검사에서도 두 방법 간의 실질적 차이가 미미하다면, 선다형 문항과 구성형 문항이 섞여 있는 각종 시험 및 리커트 척도 검사 자료를 분석함에 있어서 EAPss도 능력 추정을 위한 유력한 대안이 될 수 있을 것으로 기대하였다. 모의실험 연구는 ‘모형’, ‘검사 길이’, 및 ‘응답 문항 범주 수’를 다양하게 조건화하여 두 EAP 방법에따른 능력 수준별 분포의 능력 추정 정확도의 차이를 탐구하였으며, 피험자의 능력을 주변적(marginal) 그리고 조건적(conditional)으로 다루는 두 가지 하위연구로 구성하였다. 전자의 연구 결과 피험자 전체 수준에서 EAPrp가 EAPss에 비하여 조금 더 적은 MSE, SB, VAR 값을 산출하였지만 그 차이는 소수점 둘째 자리에서 발생하는 미미한 차이임을 확인할 수 있었으며, 후자의 연구를 통하여 표준정규 사전분포를 사용한 경우, 두 EAP 능력추정 방법 모두 [-2, 2]의 능력범위에서 0에 가까운 SB와 VAR 값을 보이고, 균일분포 사전분포를 사용한 경우 양극단에 위치한 능력수준의 피험자를 더 정확히 추정함을 알 수 있었다.

영문초록

This study is to investigate the accuracy of IRT ability parameter estimates based on summed score EAP method under the polytomous IRT model through several simulation conditions, comparing traditional EAP method based on item response pattern. The former has an advantage in that it can produce convincing scale scores to the public by estimating one-to-one ability estimates with the number correct scores. It also has an disadvantage, however, ignoring the information the item response patterns have. Even though it has the problem of losing information, this study expects to find the degree of estimation accuracy is similar to the latter. For this, the study compares the accuracy of recovery of true ability parameter under the several simulation conditions, and presents the results with the value of MSE, SB, and VAR. The results show that the latter produces a little bit smaller value of MSE, SB, and VAR than the former. However, the difference usually happens at the second decimal places. Moreover, such results of both two EAP have very small difference values close to 0 on the ability parameter scale between ?2 and 2. Therefore, the results imply that the use of EAP based on summed score can be the promising alternative ability estimation method under the operation of an actual testing program. Key words : Polytomous item response theory, EAP method based on the summed score, EAP method

키워드

다분문항반응이론 검사총점에 근거한 사후기대추정법 문항 반응양식에 근거한 사후기대추정법 Polytomous item response theory EAP method based on the summed score EAP method based on the item-response pattern

참고문헌 (35건)

강태훈 (2014). IRT 능력모수 추정에 있어서 검사총점에 근거한 사후기대추정법의 정확성에 관한 연구. 교육방법연구, 26(1), 1-19.
강태훈, 김명연 (2012). 모의실험연구를 통한 등급반응모형과 일반화부분점수모형 비교. 교육평가연구, 25(3), 479-496.
강태훈, 백순근 (2007). 3모수 문항반응이론의 능력모수 추정 방식과 검사의 문항곤란도구성이 능력추정의 표준오차에 미치는 영향. 교육평가연구, 20(1), 73-97.
김경희 (1993). 문항수, 문항난이도, 문항변별도 변화에 따른 신뢰도 계수와 검사정보함수의 변화. 석사학위논문, 이화여자대학교.
김성훈 (2012a). IRT 능력 추정에서 정보적 사전분포에 기초한 EAP 방법과 무정보적 사전분포에 기초한 방법의 기능 비교. 교육평가연구, 23(2), 441-463.
김성훈 (2012b). 문항반응이론(IRT) 능력 추정치들의 측정학적 특성에 관한 추수연구. 교육평가연구, 25(4), 829-849.
김성훈, 박인심 (2010). IRT 능력 추정에서 정보적 사전분포에 기초한 EAP 방법과 무정보적 사전분포에 기초한 방법의 기능 비교. 교육평가연구. 23(2), 441-463.
박 정 (1999a). 다분 문항반응이론 모형의 능력모수 추정치의 편파도 감소를 위한 모수 추정방법. 교육평가연구, 12(2), 195-218.
박 정 (1999b). 검사의 길이, 반응 범주의 개수, 피험자의 수 및 피험자 능력분포에 따른 다분문항반응이론 모형의 문항모수 추정치의 정확도. 교육평가연구, 12(1), 17-4.
박태준, 시기자, 신동광, 김성혜, 이용상, 윤지환, 박지선, 민호기, 박용효, 김준식, 정채관, 임수연, 주헌우, 김미지, 박찬호, 반재천, 조동완 (2013). 대학 입시 전형 간소화 정책과 연계한 NEAT(2, 3급) 개선 방안 연구. RRE 2013-1, 한국교육과정평가원.
한국성인교육학회 (1998). 교육평가 용어사전. 서울: 학지사.
Bock, R. D., & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6, 431-444.
Cai, L., du Toit, S. H. C., & Thisse, D. (2011). IRTPRO: Fleible, multidimensional, multiple categorical IRT modeling. Chicago, IL: Scientific Software International.
De Ayala, R. J. (1995). Itme parameter recovery for the nominal response model. Paper presented at the annul meeting of the American Education Research Association, San Franciscon, CA.
Finn, R. H. (1972). Effects of some variations in rating scale characteristics on the means and reliabilities of ratings. Educational and Psychological Measurement, 34, 885-892.
Garner, W. R. (1960), Rating scales, discriminability and information transmission. Psychological Review, 67, 343-352.
Guilford, J. P. (1954) Psychometric methods. New York, NY, US: McGraw-Hill.
Kang, T., Cohen, A. S., & Sung, H. J. (2005). IRT Model Selection Methods for Polytomous Items. Paper presented at the annual meeting of the National Council on Measurement in Education, Montreal, Canada.
Kim, J. (2007). A comparison of calibration methods and proficiency estimators for creating IRT vertical scales. (Doctoral dissertation). Retrieved from http://ir.uiowa.edu/etd/163
Kolen, M. J. (2012). Scores and scales: considerations for PARCC assessments. Retrieved from http://www.parcconline.org/sites/parcc/files/KolenPARCCScoresandScales.pdf
Kolen, M. J., & Brennan, R. L. (2004). Test equating methods and practice. New York: Springer-Verlag.
Kolen, M. J., & Tong, Y. (2010). Psychometric properties of IRT proficiency estimates. Educational MEasurement: Issues and Practice, 29(3), 8-14.
Lee, W. -C., & Ban, J. -C. (2010). A comparison of IRT linking procedures. Applied Measurement in Education, 23, 23-48.
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading MA: Addison-Wesley.
McKelvie, S. J. (1978). Graphic rating scales&mdash: How manycategories? British Journal of Psychology, 69, 185-202.
Nunnaly, J. C. (1978). Psychometric Theory. New York, NY, US: McGraw Hill.
Orlando, M., & Thissen, D. (2000). New item fit indices for dichotomous item response theory models. Applied Psychological Measurement, 24, 50-64.
Orlando, M., & Thissen, D. (2003). Further investigation of the performance of S-X2: An item fit index for use with dichotomous item response theory models. Applied Psychological Measurement, 27, 289-298.
Stucky D. B. (2009). Item response theory for weighted summed scores (Unpublished master’s thesis). University of North Carolina at Chapel Hill, Chapel Hill, NC.
Thissen, D., & Orland, M. (2001). Item response theory for items scored in two categories. In D. Thissen, & H. Wainer (Eds.), Test scoring (pp.73-140). Mahwah, NJ, US: Lawrence Erlbaum Associates, Inc., Publishers.
Thissen, D., Pommerich, M., Billeaud, K., & Williams, V. S. L. (1995). Item response theory for scores on tests including polytomous items with ordered responses. Applied Psychological Measurement, 19, 39-49.
Tong, Y., & Kolen, M. J. (2010). IRT proficiency estimators and their impact. Paper presented at the annual conference at the National Council on Measurement in Education, Denver, CO.
Walsh, J. E. (1963). Corrections to two papers concerned with binomial events. Sankhya, 25, Series A, 427.
Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54, 427-450.
Yen, W. M. (1984). Obtaining maximum likelihood trait estimaties from number-correct scores for the three-parameter logistic model. Journal of Educational Measurement, 21, 93-111.

구매하기 (4,800)

장바구니

국문초록

영문초록

목차

키워드

참고문헌 (35건)