We consider the sparse polynomial approximation of a multivariate function on a tensor product domain from samples of both the function and its gradient. When only function samples are prescribed, weighted $$\ell ^1$$ ℓ 1 minimization has recently been shown to be an effective procedure for computing such approximations. We extend this work to the gradient-augmented case. Our main results show that for the same asymptotic sample complexity, gradient-augmented measurements achieve an approximation error bound in a stronger Sobolev norm, as opposed to the $$L^2$$ L 2 -norm in the unaugmented case. For Chebyshev and Legendre polynomial approximations, this sample complexity estimate is algebraic in the sparsity s and at most logarithmic in the dimension d, thus mitigating the curse of dimensionality to a substantial extent. We also present several experiments numerically illustrating the benefits of gradient information over an equivalent number of function samples only.