Globally asynchronous locally synchronous (GALS) is a paradigm for complexity management and re-use of large system-on-chip (SoC) architectures. GALS is most often based on specific ASIC design components or special FPGA platforms with custom development tools. In this paper we present a multiprocessor GALS implementation on a standard commercial FPGA with standard development tools. The key building block is a novel, reliable RTL mixed clock FIFO. A complete MPEG-4 video encoder with four processors is implemented for proofing the concept. The area overhead compared to a fully synchronous design is shown to be only 2% and the performance overhead is 3%. This is negligible compared to the benefits that are much better flexibility, ASIC or FPGA vendor independency, and reduced design time. Furthermore, the mixed-clock interfaces allow easy re-usability, since the RTL-level blocks do not need to be re-verified in design iterations