Using Andrich Rating Scale Model In Psychometric Analysis Of General Chemistry I Essay Test

  • Rizki Nor Amelia State University of Semarang
  • Anggi Ristiyana Puspita Sari University of Palangka Raya
  • Sri Rejeki Dwi Astuti State University of Yogyakarta
  • Dian Normalitasari Purnama State University of Yogyakarta
Keywords: andrich rating scale model, essay test, psychometric analysis, general chemistry I


General Chemistry I is a compulsory subject taken by preservice science teachers where concept mastery of the subject can be explored through an essay test. The purpose of this study was to describe the psychometric characteristics of the General Chemistry I essay test using the Andrich Rating Scale Model by the Winsteps program. The research was conducted in the Odd Semester of the 2022/2023 Academic Year involving 46 students who were selected through a cluster random sampling technique. Although the results of the psychometric analysis showed that the rating scale had not functioned as it should, the General Chemistry I essay test instrument proves to have good (unidimensional) construct validity with all items in the category of moderate difficulty level. This research at the same time proves that it is not easy to make and define categories in an essay test. The use of the Andrich Rating Scale Model provides very useful information for describing and improving the psychometric quality of essay tests on measuring the General Chemistry I ability of preservice science teachers.


Download data is not yet available.


Allanson, P.E., & Notar, C.E. 2019. Design, construction, grading of essay questions 1.0 for teachers. American International Journal of Humanities and Social Science, 5(3), 1-11.

Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43(4), 561-573.

Andrich, D. & Luo, G. 2002. Conditional pairwise estimation in the Rasch model for ordered response categories using principal component. J Appl Meas, 4, 205-221.

Apple, M.T. 2013. Using rasch analysis to create and evaluate a measurement instrument for foreign language classroom speaking anxiety. JALT Journal, 35(1), 5-28.

Arifian, F.D. 2019. Peran lembaga pencetak tenaga kependidikan (LPTK) dalam mempersiapkan generasi emas bangsa. Jurnal Pendidikan dan Kebudayaan Missio, 11(1), 26-38.

Auné, S. E., Abal, F. J. P., & Attorresi, H. F. 2020. Análisis psicométrico mediante la Teoría de la Respuesta al Ítem: modelización paso a paso de una Escala de Soledad. Ciencias Psicológicas, 14(1), 1–15.

Bacha, N. 2001. Writing evaluation: What can analytic versus holistic essay scoring tell us?. System, 29, 371-383

Bond, T.G., & Fox, C.M. 2015. Applying the rasch model: Fundamental mesurement in the human sciences (Third ed). New York: Routledge.

Boye, A.P. 2019. Writing better essay exams. Manhattan: IDEA Paper.

Chong, J., Mokshein, S.E., & Mustapha, R. 2021. Applying the rasch rating scale model (RSM) to investigate the rating scales function in survey research instrument. Cakrawala Pendidikan, 41(1), 97-111.

Chou, Y.T., & Wang, W.C. 2010. Checking dimensionality in item response models with principal component analysis on standardized residuals. Education and Psychological Measurement, 70(5), 717-731.

Clay, B. 2001. Is this a trick question? A short guide to writing effective test questions. Lawrence, KS: Kansas Curriculum Center.

Conrad, K.M., Conrad, K.J., Passetti, L.L., Funk, R.R., & Dennis, M.L. 2015. Validation of the full and short-form self-help involvement scale against the rasch measurement model. Eval Rev, 39(4), 395-427.

Ebuoh, C.N. 2018. Effects of analytical and holistic scoring patterns on scorer reliability in biology essay tests. World Journal of Education, 8(1), 111-117.

Ee, Ng.Sar., Yeo, K.J., & Mohd Kosnin, 2018. Item analysis for the adopted motivation scale using rasch model. International Journal of Evaluation and Research in Education, 7(4), 264-269.

Fisher, W. 2007. Rating scale instrument quality criteria. Rasch Measurement Transaction, 21, 1095-1098.

Ghalib, T.K., & Al-Hattami, A.A. 2015. Holistic versus analytic evaluation of EFL writing: A case study. English Language Teaching, 8(7), 225-236.

Indonesia. Undang-Undang Nomor 14 Tahun 2005 tentang Guru dan Dosen. Lembaran Negara RI Tahun 2005 Nomor 157, Tambahan Lembaran Negara Republik Indonesia Nomor 4586. Sekretariat Negara. Jakarta.

Linacre. J.M. 2021. A user’s guide to WINSTEPS. Chicago. IL.

Linacre, J. M. 2002. Optimizing rating scale category effectiveness. Journal of Applied Measurement, 3(1), 85–106

Linn, R.L., & Miller, M.D. 2005. Measurement and assessment in teaching. New Jersey: Pearson Education.

Mahmud, J. 2017. Item response theory: A basic concept. Educational Research and Reviews, 12(5), 258-266.

Mashlahah, A.U. 2018. Penerapan kurikulum mengacu KKNI dan implikasinya terhadap kualitas pendidikan di PTKIN. Edukasia: Jurnal Pendidikan Islam, 13(1). 227-248.

Meijer, R.R., & Tendeiro, J.N. 2018. Unidimensional item response theory. In P. Irwing, T. Booth, & D. J. Hugh (Eds.), The Wiley handbook of psychometric testing : A multidisciplinary reference on survey, scale and test development (pp. 413-433). Wiley.

Minbashian, A., Huon, G.F., & Bird, K.D. 2004. Approaches to studying and academic performance in short-essay exams. Higher Education, 47, 161–176.

Nilson, L. (2017) Teaching at its best: A research-based resource for college instructors (4th ed.). San Francisco: Jossey-Bass.

Payong, M.R. 2015. Guru sebagai pekerjaan profesional dalam konteks kerangka kualifikasi nasional indonesia (KKNI). Jurnal Pendidikan dan Kebudayaan Missio, 7(1), 62-69.

Peraturan Menteri Riset, Teknologi, dan Pendidikan Tinggi Nomor 55 Tahun 2017 tentang Standar Pendidikan Guru (Berita Negara Republik Indonesia Tahun 2017 Nomor 1146).

Razak. N. bin Abd.. Khairani. A.Z. bin. & Thien. L.M. 2012. Examining quality of mathemtics test items using rasch model: Preminarily analysis. Procedia - Social and Behavioral Sciences. 69. 2205-2214.

Reiner, C.M., Bothell, T.W., Sudweeks, R.R., & Wood, B. 2002. Preparing effective essay questions: A Self-directed workbook for educators. Stillwater, OK: New Forums Press.

Reynolds, C.R., Livingston, R.B., & Wilson, V.L. 2006. Measurement and assessment in education. Boston: Pearson.

Salend, S.J. 2011. Creating Student-Friendly Tests. Educational Leadership, 69(3), 52-58.

Wahyuni, L.D., Gumela, G., & Maulana, H. 2020. Interrater reliability: Comparison of essay’s tests and scoring rubrics. Journal of Physics: Conference Series, 1933, 1-6.

Wiggins, G. 2011. A true test: Toward a more authentic and equitable assessment. Phi Delta Kappa, 92(7), 81–93.

Zile-Tamsen, C.V. 2017. Using rasch analysis to inform rating scale development. Res High Educ, 58, 922-933.

How to Cite
Amelia, R.N., Sari, A.R.P., Astuti, S.R.D. and Purnama, D.N. 2023. Using Andrich Rating Scale Model In Psychometric Analysis Of General Chemistry I Essay Test. Jurnal Ilmiah Kanderang Tingang. 14, 1 (Feb. 2023), 92-101. DOI: