Virtual reality (VR) is a potential assessment format for constructs dependent on certain perceptual characteristics (e.g., realistic environment and immersive experience). The purpose of this series of studies was to explore methods of evaluating reliability and validity evidence for virtual reality assessments (VRAs) when compared with traditional assessments. We intended to provide the basis of a framework on how to evaluate VR assessments given that there are important fundamental differences to VR assessments compared with traditional assessment formats. Two commercial off‐the‐shelf (COTS) games (i.e., Project M and Richie's Plank Experience) were used in Studies 1 and 2, while a game‐based assessment (GBA; Balloon Pop, designed for assessment) was used in Study 3. Studies 1 and 2 provided limited evidence for the reliability and validity of the VRAs. However, no meaningful constructs were measured by the VRA in Study 3. Findings demonstrate limited evidence for these VRAs as viable assessment options through the validity and reliability methods utilized in the present studies, which in turn emphasize the importance of aligning the assessment purpose to the unique advantages of a VR environment.