In this work, we have developed the mathematical foundations of an adaptive education system that fully covers the six cognitive stages of Bloom's taxonomy – the levels of remembering, understanding, applying, analyzing, evaluating, and creating. The system operates through the Q-learning algorithm based on the principles of the Markov Decision Process. The point is that in our current education system, students often stop at only the first two stages of Bloom's taxonomy – namely, remembering and understanding. However, high-level skills – analysis, critical evaluation, and innovation – are the main requirements of the modern labor market, and we all know this. PISA 2022 results also proved this: our students scored 403 points in mathematics (OECD average 472) and 389 points in reading (OECD average 476). These indicators were significantly lower, especially in questions requiring complex problem-solving. In our proposed system, the student's state is given in the form of a 9-dimensional vector: a separate knowledge indicator for each Bloom level ($k_1$ to $k_6$), error coefficient, task completion time, and motivation level. We constructed the reward function with weight coefficients according to the Bloom pyramid, where growth in higher cognitive stages is rewarded more. The Q-learning algorithm selects the most optimal sequence of tasks for each Bloom level and gradually complicates them according to the spiral curriculum method. The simulation model is built for 120 students and a 15-week course. Monte Carlo simulation (repeated 10,000 times) showed interesting results: the adaptive RL system provides improvement at all Bloom levels compared to a simple traditional system. Most importantly – a significant growth in high cognitive levels: analysis +68%, evaluation +72%, and creation +81% are observed. The simulated Learning Gain indicator, which was 0.38 in the control group, reached 0.67 in the adaptive system. The system leads students step-by-step toward higher cognitive levels according to the Bloom spiral. This adaptive system based on Reinforcement Learning, which fully integrates Bloom's taxonomy, not only increases the volume of knowledge but also raises the quality of cognitive processes, helping to transform students from passive listeners into active problem solvers. This is very important today because exactly such skills will be needed in the future market.
[1]. Anderson, L. W., Krathwohl, D. R., Airasian, P. W., Cruikshank, K. A., Mayer, R. E., Pintrich, P. R., ... & Wittrock, M. C. (2001). A taxonomy for learning, teaching, and assessing: A revision of Bloom's taxonomy of educational objectives. Longman.
[2]. Bloom, B. S. (1984). The 2 sigma problem: The search for methods of group instruction as effective as one-to-one tutoring. Educational Researcher, 13(6), 4-16. https://doi.org/10.3102/0013189X013006004
[3]. Chi, M., VanLehn, K., Litman, D., & Jordan, P. (2011). Empirically evaluating the application of reinforcement learning to the induction of effective and adaptive pedagogical strategies. User Modeling and User-Adapted Interaction, 21(1-2), 137-180. https://doi.org/10.1007/s11257-010-9093-1
[4]. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Lawrence Erlbaum Associates.
[5]. Conejo, R., Guzmán, E., Millán, E., Trella, M., Pérez-De-La-Cruz, J. L., & Ríos, A. (2004). SIETTE: A web-based tool for adaptive testing. International Journal of Artificial Intelligence in Education, 14(1), 29-61.
[6]. Krathwohl, D. R. (2002). A revision of Bloom's taxonomy: An overview. Theory into Practice, 41(4), 212-218. https://doi.org/10.1207/s15430421tip4104_2
[7]. Liu, Z., Yang, Y., Huang, T., & Zeng, D. (2019). Deep reinforcement learning for personalized educational path recommendation. In Proceedings of the IEEE International Conference on Data Mining (pp. 1192-1197). IEEE. https://doi.org/10.1109/ICDM.2019.00135
[8]. Lord, F. M. (1980). Applications of item response theory to practical testing problems. Lawrence Erlbaum Associates.
[9]. Mandel, T., Liu, Y. E., Levine, S., Brunskill, E., & Popovic, Z. (2014). Offline policy evaluation across representations with applications to educational games. In Proceedings of the 2014 International Conference on Autonomous Agents and Multi-Agent Systems (pp. 1077-1084). IFAAMAS.
[10]. Marzano, R. J., & Kendall, J. S. (2007). The new taxonomy of educational objectives (2nd ed.). Corwin Press.
[11]. Marzano, R. J. (2001). Designing a new taxonomy of educational objectives. Corwin Press.
[12]. OECD. (2023). PISA 2022 results: What students know and can do. OECD Publishing. https://doi.org/10.1787/53f23881-en
[13]. Piech, C., Bassen, J., Huang, J., Ganguli, S., Sahami, M., Guibas, L. J., & Sohl-Dickstein, J. (2015). Deep knowledge tracing. In Advances in Neural Information Processing Systems (Vol. 28, pp. 505-513). Curran Associates.
[14]. Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction (2nd ed.). MIT Press.
[15]. Watkins, C. J., & Dayan, P. (1992). Q-learning. Machine Learning, 8(3-4), 279-292. https://doi.org/10.1007/BF00992698
[16]. World Economic Forum. (2023). Future of jobs report 2023. Retrieved from https://www.weforum.org/reports/the-future-of-jobs-report-2023
[17]. Zualkernan, I. A. (2006). A framework and a methodology for developing authentic constructivist e-learning environments. Educational Technology & Society, 9(2), 198-212.