At each time step, the network received a reward r(t) = 1 if the mouth moved closer to the food source, or r(t) = .1 if the mouth moved farther.