Traditional multiview video coding schemes compress all captured video frames exploiting all possible inter-view and temporal frame correlation for coding gain, creating complex inter-frame dependencies in the process. In contrast, interactive multiview video streaming (IMVS) demands data navigation flexibility in the frame structure design, so that server can send only a single periodically selected video view for decoding and display at client, saving transmission bandwidth. In this paper, we generalize previous IMVS frame structure optimization to allow a client to request an arbitrary virtual view; i.e., the server sends two adjacent coded views for the client to synthesize the desired virtual view. Since existing IMVS schemes transmit only one view at a time, they employ only cross-time prediction; i.e., the frame of previous time instant from which the client switches is used as predictor for the requested view. In our new scenario, two coded views are transmitted, thus within-time prediction can also be used, where the coded frame of one transmitted view is used to predict the frame of the other view of same time instant. Using I-frames, P-frames and Merge (M-) frames as building blocks, we formulate a Lagrangian problem to find the optimal frame structure for a desired storage/streaming rate tradeoff, with the right mixture of cross-time / within-time prediction types. Experiments show that for the same storage cost, the expected streaming rate of the proposed structure can be 40% lower than that of the I-frame-only structure, and 9% lower than that of the structure using M-frames but with cross-time prediction only.