The Message Passing Interface (MPI) has already become a standard of the communication library for distributed-memory computing systems. Since the release of the new versions of MPI specification, several MPI implementations have been made publicly available. Different implementations employ different approaches. It is critical to selecting an appropriate MPI implementation for message passing based applications, because performance of communication is extremely crucial to these applications. Our study is intended to provide a guideline on how to submit a task and how to perform such a task, economically and effectively, on workstation clusters in high performance computing. We investigate several MPI aspects including its implementations, supporting hardware environment and derived datatype which affect the communication performance. In the end, our results point out the strength and weakness of different implementations on our experimental system.