Self-organizing map (SOM) is an important statistical method for cluster analysis. The conventional single instruction multiple data (SIMD) solution sufficiently exploits the massive intrinsic parallelism of artificial neural networks. In this paper, we introduce a parallel-elementary-stream (PES) architecture for nearest-neighbor-search (NNS) based SOM model as an alternative to SIMD. The PES defines p parallel units each of which can process one element per clock. Meanwhile, every d-dimensional neuron vector (NV) as well as the input vector is partially stored in p dual-port memory blocks. In particular, each memory block stores ⌈d/p⌉ vector-components each of which is defined as an element. The distance between n NVs and one input vector for NNS can be sequentially calculated in n×⌈d/p⌉+α clock cycles where α is the pipeline delay. Furthermore, the processing unit for NNS is reconfigured to update the winner neurons. The experimental results show that the NNS performance is 3 times higher than the SIMD solution with the same working frequency and process technology but the core area is 4.3 times smaller than that of the SIMD solution.