Year 2019, Volume 6, Issue 2, Pages 259 - 278 2019-06-30

Explanatory Item Response Models for Polytomous Item Responses

Luke Stanke [1] , Okan Bulut [2]

1 85

Item response theory is a widely used framework for the design, scoring, and scaling of measurement instruments. Item response models are typically used for dichotomously scored questions that have only two score points (e.g., multiple-choice items). However, given the increasing use of instruments that include questions with multiple response categories, such as surveys, questionnaires, and psychological scales, polytomous item response models are becoming more utilized in education and psychology. This study aims to demonstrate the application of explanatory item response models to polytomous item responses in order to explain common variability in item clusters, person groups, and interactions between item clusters and person groups. Explanatory forms of several polytomous item response models – such as Partial Credit Model and Rating Scale Model – are demonstrated and the estimation procedures of these models are explained. Findings of this study suggest that explanatory item response models can be more robust and parsimonious than traditional item response models for polytomous data where items and persons share common characteristics. Explanatory polytomous item response models can provide more information about response patterns in item responses by estimating fewer item parameters.

Polytomous IRT, explanatory item response modeling, assessment, partial credit model
  • Albano, A. D. (2013). Multilevel modeling of item position effects. Journal of Educational Measurement, 50(4), 408–426. doi:10.1111/jedm.12026
  • Adams, R. J., Wu, M. L., & Wilson, M. (2012). The Rasch rating model and the disordered threshold controversy. Educational and Psychological Measurement, 72(4), 547–573. doi: 10.1177/0013164411432166
  • American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, & Joint Committee on Standards for Educational and Psychological Testing. (2014). Standards for educational and psychological testing. Washington, DC: AERA.
  • Andrich, D. (1978). Application of a psychometric rating model to ordered categories which are scored with successive integers. Applied Psychological Measurement, 2(4) 581–594. doi:10.1177/014662167800200413
  • Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716–723. doi:10.1109/TAC.1974.1100705
  • Bates, D., Maechler, M., Bokler, B., & Walker, S. (2014). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48. doi:10.18637/jss.v067.i01
  • Beretvas, S. N. (2008). Cross-classified random effects models. In A. A. O’Connell & D. Betsy McCoach (Eds.), Multilevel modeling of educational data (pp. 161-197). Charlotte, SC: Information Age Publishing.
  • Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores. Reading, MA: Addison–Wesley.
  • Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37(1), 29–51. doi:10.1007/BF02291411
  • Bock, R. D., & Aitkin, M. (1981) Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443–459. doi:10.1007/BF02293801
  • Bond, T., & Fox, C. (2001). Applying the Rasch model: Fundamental measurement in the human sciences. Mahwah, NJ: Lawrence Erlbaum Associates.
  • Briggs, D. C. (2008). Using explanatory item response models to analyze group differences in science achievement. Applied Measurement in Education, 21(2), 89 - 118.
  • Bulut, O. (2019). eirm: Explanatory item response modeling for dichotomous and polytomous item responses [Computer software]. Available from
  • Bulut, O., Palma, J., Rodriguez, M. C., & Stanke, L. (2015). Evaluating measurement invariance in the measurement of developmental assets in Latino English language groups across developmental stages. Sage Open, 5(2), 1-18. doi:10.1177/2158244015586238
  • Cawthon, S., Kaye, A., Lockhart, L., & Beretvas, S. N. (2012). Effects of linguistic complexity and accommodations on estimates of ability for students with learning disabilities. Journal of School Psychology, 50, 293–316. doi:10.1016/j.jsp.2012.01.002
  • Cohen, A. S., & Bolt, D. M. (2005). A mixture model analysis of differential item functioning. Journal of Educational Measurement, 42(2), 133–148. doi:10.1111/j.1745-3984.2005.00007
  • De Ayala, R. J., Kim, S. H., Stapleton, L. M., & Dayton, C. M. (2002). Differential item functioning: A mixture distribution conceptualization. International Journal of Testing, 2(3-4), 243–276.
  • De Boeck, P. (2008). Random item IRT models. Psychometrika, 73, 533-559. doi:10.1007/s11336-008-9092-x
  • De Boeck, P., & Partchev, I. (2012). IRTrees: Tree-based item response models of the GLMM family. Journal of Statistical Software, 48(1), 1–28.
  • De Boeck, P., & Wilson, M. (2004). Explanatory item response models: a generalized linear and nonlinear approach. Statistics for Social Science and Public Policy. New York, NY. Springer.
  • Desjardins, C. D., & Bulut, O. (2018). Handbook of educational measurement and psychometrics using R. Boca Raton, FL: CRC Press.
  • Embretson, S. E. (1983). Construct validity: Construct representation versus nomothetic span. Psychological Bulletin, 93(1). 179–197.
  • Embretson, S. E. (1994). Applications of cognitive design systems to test development. In C. R. Reynolds, Cognitive Assessment (pp. 107¬–135). Springer USA.
  • Embretson, S. E. (1998). A cognitive design system approach to generating valid tests: Application to abstract reasoning. Psychological Methods, 3(3), 380–396.
  • Embretson, S. E. (2006). Cognitive models for the psychometric properties of GRE quantitative items. Final Report. Princeton, NJ: Educational Testing Service.
  • Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum.
  • Embretson, S. E., & Yang, X. (2007). Construct validity and cognitive diagnostic assessment. In J. P. Leighton & M. J. Gierl (Eds.), Cognitive diagnostic assessment for education (pp. 119–145). New York, NY: Cambridge University Press.
  • Fischer, G. H. (1973). The linear logistic test model as an instrument in educational research. Acta Psychologica, 37(6), 359–374.
  • French, B. F., & Finch, W. H. (2010). Hierarchical logistic regression: Accounting for multilevel data in DIF detection. Journal of Educational Measurement, 47(3). 299–317. doi:10.1111/j.1745-3984.2010.00115.x
  • Ferster, A. E. (2013). An evaluation of item level cognitive supports via a random-effects extension of the linear logistic test model. Unpublished doctoral dissertation, University of Georgia.
  • Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (2013). Bayesian data analysis. Boca Raton, FL: CRC Press.
  • Hartig, J., Frey, A., Nold, G., & Klieme, E. (2012). An application of explanatory item response modeling for model-based proficiency scaling. Educational and Psychological Measurement, 72(4), 665–686. doi:10.1177/0013164411430707
  • Holling, H., Bertling, J. P., & Zeuch, N. (2009). Automatic item generation of probability word problems. Studies in Educational Evaluation, 35, 71–76. doi:10.1016/j.stueduc.2009.10.004
  • Janssen, R. (2010). Modeling the effect of item designs within the Rasch model. In. S. E. Embretson (Ed.), Measuring psychological constructs: Advances in model-based approaches (pp. 227–245). Washington, DC, US: American Psychological Association.
  • Janssen, R., Schepers, J., & Peres, D. (2004). Models with item and item group predictors. In P. De Boeck & M. Wilson (Eds.), Explanatory item response models: A generalized linear and nonlinear approach (pp. 189–212). New York, NY: Springer-Verlag.
  • Jiao, H., & Zhang, Y. (2014). Polytomous multilevel testlet models for testlet‐based assessments with complex sampling designs. British Journal of Mathematical and Statistical Psychology, 68(1), 65–83. doi:10.1111/bmsp.12035
  • Kan, A., & Bulut, O. (2014). Examining the relationship between gender DIF and language complexity in mathematics assessments. International Journal of Testing, 14(3), 245–264.
  • Kuha, J. (2004). AIC and BIC: Comparisons of assumptions of performance. Sociological Methods and Research, 33, 188–229. doi:10.1177/0049124103262065
  • Kubinger, K. (2008). On the revival of the Rasch model-based LLTM: from constructing tests using item generating rules to measuring item administration effects. Psychological Science Quarterly, (3), 311–327.
  • Linacre, J. M. (2002). Optimizing rating scale category effectiveness. Journal of Applied Measurement, 5(1), 85–106.
  • Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.
  • Lunn, D. J., Thomas, A., Best, N., & Spiegelhalter, D. (2000). WinBUGS-a Bayesian modelling framework: Concepts, structure, and extensibility. Statistics and Computing, 10(4), 325–337. doi:10.1023/A:1008929526011
  • Luppescu, S. (2012, April). DIF detection in HLM item analysis. Paper presented at the annual meeting of the American Educational Research Association, New Orleans, LA.
  • Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47(2), 149–174. doi:10.1007/BF02296272
  • Natesan, P., Limbers, C., & Varni, J. W. (2010). Bayesian estimation of graded response multilevel models using Gibbs sampling: formulation and illustration. Educational and Psychological Measurement, 70(3) 420–439. doi:10.1177/0013164409355696
  • Plieninger, H. & Meiser, T. (2014). Validity of multi-process IRT models for separating content and response styles. Educational and Psychological Measurement, 74(5), 875–899. doi:10.1177/0013164413514998
  • Prowker, A., & Camilli, G. (2007). Looking beyond the overall scores of NAEP assessments: Applications of generalized linear mixed modeling for exploring value‐added item difficulty effects. Journal of Educational Measurement, 44(1), 69–87. doi:10.1111/j.1745-3984.2007.00027.x
  • R Core Team (2018). R: A language and environment for statistical computing. R Foundation for Statistical Computing: Vienna, Austria.
  • Rasch, G. (1960/1980). Probabilistic models for some intelligence and attainment tests (Copenhagen, Danish Institute for Educational Research), expanded edition (1980) with foreword and afterword by B. D. Wright. Chicago: The University of Chicago Press.
  • Reise, S. P., & Yu, J. (1990). Parameter recovery in the graded response model using MULTILOG. Journal of Educational Measurement, 27(2), 133–144.
  • Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores (Psychometric Monograph No. 17). Richmond, VA: Psychometric Society. Retrieved from
  • Schwarz, G.E. (1978). Estimating the dimension of a model. Annals of Statistics, 6(2), 461–464. doi:10.1214/aos/1176344136
  • Scheiblechner, H. H. (2009). Rasch and pseudo-Rasch models: suitableness for practical test applications. Psychology Science Quarterly, 51, 181–194.
  • Thissen, D., Chen, W., & Bock, D. (2003). MULTILOG 7 [Computer software]. Chicago, IL: Scientific Software International.
  • Tuerlinckx, F., & Wang, W.-C. (2004). Models for polytomous data. In P. De Boeck & M. Wilson (Eds.), Explanatory item response models: A generalized linear and nonlinear approach (pp. 75–109). New York: Springer-Verlag.
  • Tutz, G. (1990). Sequential item response models with an ordered response. British Journal of Mathematical and Statistical Psychology, 43(1), 39–55.
  • Tutz, G. (1991). Sequential models in categorical regression. Computational Statistics and Data Analysis, 11(3), 275–295. doi:10.1111/j.2044-8317.1990.tb00925.x
  • Vaughn, B. K. (2006). A hierarchical generalized linear model of random differential item functioning for polytomous items: A Bayesian multilevel approach. Electronic Theses, Treatises and Dissertations. Paper 4588.
  • Van den Noortgate, W., De Boeck, P., & Meulders, M. (2003). Cross-classification multilevel logistic models in psychometrics. Journal of Educational and Behavioral Statistics, 28(4), 369–386. doi:10.3102/10769986028004369
  • Van den Noortgate, W., & Paek, I. (2004). Person regression models. In P. De Boeck & M. Wilson (Eds.), Explanatory item response models: A generalized linear and nonlinear approach (pp. 167–187). New York, NY: Springer-Verlag.
  • van der Linden, W. J. & Hambleton, R. K. (1997). Item response theory: Brief history, common models, and extensions. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 1–28). New York: Springer
  • Vansteelandt, K. (2000). Formal models for contextualized personality psychology. Unpublished doctoral dissertation, K.U. Leuven, Belgium.
  • Verhelst, N. D., & Verstralen, H. H. F. M. (2008). Some considerations on the Partial Credit Model. Psicologica: International Journal of Methodology and Experimental Psychology, 29(2), 229–254.
  • Wang, W.-C., & Liu, C.-Y. (2007). Formulation and application of the generalized multilevel facets model. Educational and Psychological Measurement, 67(4), 583 - 605. doi:10.1177/0013164406296974
  • Wang, W.-C., & Wilson, M. (2005). Exploring local item dependence using a random-effects facet model. Applied Psychological Measurement, 29(4), 296 - 318. doi:10.1177/0146621605276281
  • Wang, W.-C., Wilson, M., & Shih, C.-L. (2006). Modeling randomness in judging rating scales with a random-effects rating scale model. Journal of Educational Measurement, 43(4), 335–353. doi:10.1111/j.1745-3984.2006.00020.x
  • Wang, W.-C., & Wu, S.-L. (2011). The random-effect generalized rating scale model. Journal of Educational Measurement, 48(4), 441-456. doi:10.1111/j.1745-3984.2011.00154.x
  • Williams, N. J., & Beretvas, S. N. (2006). DIF identification using HGLM for polytomous items. Applied Psychological Measurement, 30, 22–42. doi:10.1177/0146621605279867
  • Wilson, M., De Boeck, P., & Carstensen, C. H. (2008). Explanatory item response models: A brief introduction. In Hartig, J., Klieme, E., Leutner, D. (Eds.), Assessment of competencies in educational contexts: State of the art and future prospects (pp. 91-120). Göttingen, Germany: Hogrefe & Huber.
  • Wilson, M., Zheng, X., & McGuire, L. (2012). Formulating latent growth using an explanatory item response model approach. Journal of Applied Measurement, 13(1), 1–22.
  • Wright, B. D., & Masters, G. N. (1982). Rating scale analysis. Chicago: Mesa Press.
  • Zwinderman, A. H. (1991). A generalized Rasch model for manifest predictors. Psychometrika, 56(4), 589–600.
Primary Language en
Subjects Education, Scientific Disciplines
Published Date June
Journal Section Articles

Orcid: 0000-0002-4340-6954
Author: Luke Stanke
Institution: Tessellation
Country: United States

Orcid: 0000-0001-5853-1267
Author: Okan Bulut (Primary Author)
Institution: University of Alberta
Country: Canada

Bibtex @research article { ijate515085, journal = {International Journal of Assessment Tools in Education}, issn = {}, eissn = {2148-7456}, address = {İzzet KARA}, year = {2019}, volume = {6}, pages = {259 - 278}, doi = {10.21449/ijate.515085}, title = {Explanatory Item Response Models for Polytomous Item Responses}, key = {cite}, author = {Stanke, Luke and Bulut, Okan} }
APA Stanke, L , Bulut, O . (2019). Explanatory Item Response Models for Polytomous Item Responses. International Journal of Assessment Tools in Education, 6 (2), 259-278. DOI: 10.21449/ijate.515085
MLA Stanke, L , Bulut, O . "Explanatory Item Response Models for Polytomous Item Responses". International Journal of Assessment Tools in Education 6 (2019): 259-278 <>
Chicago Stanke, L , Bulut, O . "Explanatory Item Response Models for Polytomous Item Responses". International Journal of Assessment Tools in Education 6 (2019): 259-278
RIS TY - JOUR T1 - Explanatory Item Response Models for Polytomous Item Responses AU - Luke Stanke , Okan Bulut Y1 - 2019 PY - 2019 N1 - doi: 10.21449/ijate.515085 DO - 10.21449/ijate.515085 T2 - International Journal of Assessment Tools in Education JF - Journal JO - JOR SP - 259 EP - 278 VL - 6 IS - 2 SN - -2148-7456 M3 - doi: 10.21449/ijate.515085 UR - Y2 - 2019 ER -
EndNote %0 International Journal of Assessment Tools in Education Explanatory Item Response Models for Polytomous Item Responses %A Luke Stanke , Okan Bulut %T Explanatory Item Response Models for Polytomous Item Responses %D 2019 %J International Journal of Assessment Tools in Education %P -2148-7456 %V 6 %N 2 %R doi: 10.21449/ijate.515085 %U 10.21449/ijate.515085
ISNAD Stanke, Luke , Bulut, Okan . "Explanatory Item Response Models for Polytomous Item Responses". International Journal of Assessment Tools in Education 6 / 2 (June 2019): 259-278.