摘要

This study examined the relative effectiveness of the multidimensional bi-factor model and multidimensional testlet response theory (TRT) model in accommodating local dependence in testlet-based reading assessment with both dichotomously and polytomously scored items. The data used were 14,089 test-takers' item-level responses to the testlet-based reading comprehension section of the Graduate School Entrance English Exam (GSEEE) in China administered in 2011. The results showed that although the bi-factor model was the best-fitting model, followed by the TRT model, and the unidimensional 2-parameter logistic/graded response (2PL/GR) model, the bi-factor model produced essentially the same results as the TRT model in terms of item parameter, person ability and standard error estimates. It was also found that the application of the unidimensional 2PL/GR model had a bigger impact on the item slope parameter estimates, person ability estimates, and standard errors of estimates than on the intercept parameter estimates. It is hoped that this study might help to guide test developers and users to choose the measurement model that best satisfies their needs based on available resources.