In this work, we present an optimization strategy for implementing the BiCGStab iterative solver on graphic processing units (GPU) for computing incompressible viscous flows governed by the unsteady Navier–Stokes (N–S) equations on a CUDA platform. A recently developed $$\psi $$ ψ –v formulation is used to discretize the biharmonic form of the N–S equation. Special emphasis was given on optimizing the matrix-vector multiplication and the dot product kernels in GPU as the bulk of the computational efforts was seen to be occupied by these two kernels. The parallel code was implemented on the benchmark problem of lid-driven square cavity flow and remarkable speed-up up to 20 times was achieved on finer grids. The GPU implementation enabled us to compute the flow on extremely fine grids and very small scales were resolved with remarkable accuracy.