Report Highlights RL Model Limitations
A technical report on Weights & Biases demonstrates that baseline reinforcement learning algorithms struggle with non-stationary behaviors in synthetic environments. The findings highlight the need for more robust and interpretable models when tracking evolving states, such as a student's knowledge. This presents a challenge for validating adaptive models against the noisy data of real-world learners.
- The "catastrophic forgetting" problem in online RL is a significant hurdle in non-stationary environments; agents tend to forget previous knowledge when trained on new experiences. A proposed solution is Locally Constrained Policy Optimization (LCPO), which anchors the policy on older experiences while optimizing for newer ones. - Deep Knowledge Tracing (DKT) utilizes Recurrent Neural Networks (RNNs) to model the changing knowledge state of a student over time, outperforming older methods like Bayesian Knowledge Tracing (BKT). Unlike BKT, DKT does not require the explicit encoding of human domain knowledge and can learn more complex representations of student understanding. - Bayesian Knowledge Tracing (BKT) models a student's knowledge as a set of binary variables for each skill—either mastered or not. The original BKT model assumed that a skill, once learned, is never forgotten. - A key challenge in applying RL to educational settings is defining the student's state representation; research has shown that more complex state spaces do not necessarily lead to better performance in terms of expected future rewards. - Interpreting the "black box" nature of deep learning models for knowledge tracing is a major obstacle to their practical application. Post-hoc explanation methods, such as Layer-wise Relevance Propagation (LRP), are being explored to make these models more transparent. - RL-based approaches to instructional sequencing have been researched since the 1960s, with studies showing that RL-induced policies significantly outperform baseline methods in over half of the cases. Success is often highest when RL is constrained with principles from cognitive psychology and learning sciences. - To combat non-stationarity, some RL techniques use adaptive learning algorithms that adjust their update rules or exploration rates in real-time. For instance, Q-learning can use decay factors to give more weight to recent experiences. - Newer knowledge tracing models are being developed to be consistent with the learning process, and some models leverage student-to-student information for more effective tracking.