KURT DRIESSENS and SASO DZEROSKI (2004) "Integrating Guidance into Relational Reinforcement Learning" Machine Learning, 57, pp.271-304, ================================================================================================== Reinforcement learning, and Q-learning in particular, encounter two major problems when dealing with large state spaces. First, learning the Q-function in tabular form may be infeasible because of the excessive amount of memory needed to store the table, and <>. Second, rewards in the state space may be so sparse that with random exploration they will only be discovered extremely slowly. ... -------------------------------------------------------------------------------------------------- "In reinforcement learning, an agent tries to learn a policy, i.e., how to select an action in a given state of the environment, so that it maximizes the total amount of reward it receives when interacting with the environment." --- "Q-learning (Watkins, 1989) is a form of reinforcement learning where the optimal policy is learned implicitly in the form of a Q-function, which takes a state-action pair as input and outputs the quality of the action in that state. The optimal action in a given state is then the action with the largest Q-value." --- One of the main limitations of standard Q-learning is related to the number of different state-action pairs that may exist. The Q-function can in principle be represented as a table with one entry for each state-action pair. --- "Using random exploration through the search space, rewards may simply never be encountered." --- "Thus a mix between the classical unsupervised Q-learning and (supervised) behavioral cloning is obtained."