Dynamic soaring (DS) is an aerobatic maneuver whereby a gliding aircraft harnesses energy from horizontal wind that varies in strength and/or direction to support flight. Typical approaches to dynamic soaring in autonomous unmanned aerial vehicles (UAVs) use nonlinear optimizers to generate energy-gaining trajectories, which are then followed using traditional controllers. The effectiveness of such a strategy is limited by both the local optimality of the generated trajectory, as well as controller tracking errors. In this paper, we investigate a reinforcement learning (RL) approach working in continuous space to control a DS aircraft flying in shear wind conditions. The RL controller operates in two stages: In the first stage, it observes a traditional sample-based controller flying a locally optimal DS trajectory generated a priori. In the second stage, the sample-based controller is removed and authority is passed to the RL algorithm. We show that by deviating from the original planned trajectory, the RL controller is able to achieve better performance than its baseline teacher controller.