Barrier performance for synchronizing threads on current multi-core systems can be critical for scientific applications that traverse a large number of relatively small parallel regions, that is, that exhibit an unfavorable computation to synchronization ratio. By means of a synthetic and a real-world benchmark we assess 4 alternative barrier implementations on 7 current multi-core systems with 2 up to 32 cores. We find that, barrier performance is application- and data-specific with respect to cache utilization, but that a rather naïve lock-free barrier implementation yields good results across all applications and multi-core systems tested. We also assess distinct implementations of reduction operations that are computed in conjunction with the barriers. The synthetic and real-world benchmarks are made available as open-source code for further testing.