Augmented reality enhanced collaboration systems often allow users to draw 2D gesture annotations onto video feeds to help collaborators to complete physical tasks. This works well for static cameras, but for movable cameras, perspective effects cause problems when trying to render 2D annotations from a new viewpoint in 3D. In this paper, we present a new approach towards solving this problem by using gesture enhanced annotations. By first classifying which type of gesture the user drew, we show that it is possible to render annotations in 3D in a way that conforms more to the original intention of the user than with traditional methods. We first determined a generic vocabulary of important 2D gestures for remote collaboration by running an Amazon Mechanical Turk study with 88 participants. Next, we designed a novel system to automatically handle the top two 2D gesture annotations — arrows and circles. Arrows are handled by identifying their anchor points and using surface normals for better perspective rendering. For circles, we designed a novel energy function to help infer the object of interest using both 2D image cues and 3D geometric cues. Results indicate that our approach outperforms previous methods in terms of better conveying the original drawing's meaning from different viewpoints.