In this paper, we have first characterized candidates of the Competition for Authenticated Encryption, Security, Applicability, and Robustness (CAESAR) from the point of view of their suitability for parallel processing of multiple blocks of associated data, message, and ciphertext. Then, we have chosen seven candidates from the Round 2 and Round 3 submissions, namely SCREAM, AES-COPA, Minalpher, OCB, AES-OTR, COLM, and Deoxys. We first obtained the initial estimates of the maximum clock frequency, throughput, area, and critical path for the high-speed Basic Iterative Architecture of each of the above candidates. Then, we implemented a two-stage inner-round pipelining for all the aforementioned algorithms in order to improve the frequency and throughput by reducing the critical path and processing multiple blocks of data simultaneously. We targeted the largest available FPGA in the student version of Xilinx ISE, i.e., Xilinx Virtex 6 XC6VLX75T-3FF784. Our results have demonstrated the improvement in the clock frequency and throughput by a factor varying from x1.28 for OCB to x1.84 for SCREAM, and the change in the throughput to area ratio (with area expressed using LUTs) by a factor varying from x0.93 for Minalpher to x1.72 for SCREAM.