The Gram–Schmidt process uses orthogonal projection to construct the A = QR factorization of a matrix. When Q has linearly independent columns, the operator P = I − Q(QTQ)−1QT defines an orthogonal projection onto Q⊥. In finite precision, Q loses orthogonality as the factorization progresses. A family of approximate projections is derived with the form P = I − QTQT, with correction matrix T. When T = (QTQ)−1, and T is triangular, it is postulated that the best achievable orthogonality is . We present new variants of modified (MGS) and classical Gram–Schmidt algorithms that require one global reduction step. An interesting form of the projector leads to a compact WY representation for MGS. In particular, the inverse compact WY MGS algorithm is equivalent to a lower triangular solve. Our main contribution is to introduce a backward normalization lag into the compact WY representation, resulting in a stable Generalized Minimal Residual Method (GMRES) algorithm that requires only one global reduce per iteration. Further improvements in performance are achieved by accelerating GMRES on GPUs.