It is well known that graphics processing units (GPUs) are able to accelerate highly parallelizable algorithms with a high speedup. However, for less-parallelizable algorithms such as the finite element method, novel schemes are needed to achieve a high speedup. In this paper, the dual-field domain decomposition (DFDD) method based on element-level decomposition (DFDD–ELD) is accelerated on a large GPU cluster. By using element-level subdomains, the DFDD–ELD computation can be easily mapped onto GPU's granular processors and is thus highly parallelizable. Various electromagnetic problems are simulated to demonstrate the speedup and scalability of DFDD–ELD on a GPU cluster. With a careful GPU memory arrangement and thread allocation, we are able to achieve a significant speedup by utilizing GPUs in a message-passing interface (MPI)-based cluster environment. The same acceleration strategy can be applied to the acceleration of the discontinuous Galerkin time-domain (DGTD) algorithms.