Personalized medicine is a rapidly expanding area of health research wherein patient level information is used to inform their treatment. Dynamic treatment regimens (DTRs) are a means of formalizing the sequence of treatment decisions that characterize personalized management plans. Identifying the DTR which optimizes expected patient outcome is of obvious interest and numerous methods have been proposed for this purpose. We present a new approach which builds on two established methods: Q‐learning and G‐estimation, offering the doubly robust property of the latter but with ease of implementation much more akin to the former. We outline the underlying theory, provide simulation studies that demonstrate the double‐robustness and efficiency properties of our approach, and illustrate its use on data from the Promotion of Breastfeeding Intervention Trial.