Asynchronous message passing paradigm is commonly used in high performance computing (HPC). Message non-determinism makes the error detection in message passing programs very difficult. The prior work uses an over-approximation of the precise match pair records (each is a pair of a send and a receive that may potentially match in the runtime) to capture all possible message communication in a concurrent trace program (CTP). Symbolic model checking with such a set of match pairs is able to witness program properties including deadlock, message race, and zero-buffer compatibility. However, the approach is inefficient because of the exponential ways of match pair resolution, where most of them are non-feasible for property witnessing. This paper presents an effective under-approximation algorithm that is able to shrink the generated match pair set, thus prune most non-feasible match pairs. The algorithm first sections each process in a CTP such that each potential sender distributes roughly a bounded number of sends to match the same number of receives in the process, and then approximating the match pairs for the sends and receives in each section independently by a few simple rules with ranking. Novelty in the work is that the algorithm has the flexibility to generate the match pair set with various size based on the user input. This paper further presents that the precise match pairs for any CTP can be generated with a bounded input. The experiments over a set of benchmarks show that the symbolic model checking with the algorithm in this paper outperforms the state-of-the-art tools such that the runtime performance of property witnessing is drastically reduced as all the properties are witnessed with a small set of match pairs generated by the new algorithm. The results also show that the algorithm is able to scale to a program that employs a high degree of message non-determinism and/or a high degree of deep communication.