This paper presents an optimized low-complexity and high-throughput MIMO signal detector core for detecting spatially multiplexed data streams. The core architecture supports various layer configurations up to 4, while achieving near-optimal performance, and configurable modulation constellations up to 256-QAM on each layer. The core is capable of operating as a soft-input soft-output log-likelihood ratio (LLR) MIMO detector which can be used in the context of iterative detection and decoding. High area-efficiency is achieved via algorithmic and architectural optimizations performed at two levels. First, distance computations and slicing operations for an optimal 2-layer maximum a posteriori MIMO detector are optimized to eliminate use of multipliers and reduce the overhead of slicing in the presence of soft-input LLRs. We show that distances can be easily computed using elementary addition operations, while optimal slicing is done via efficient comparisons with soft decision boundaries, resulting in a simple feed-forward pipelined architecture. Second, to support more layers, an efficient channel decomposition scheme is presented that reduces the detection of multiple layers into multiple 2-layer detection subproblems, which map onto the 2-layer core with a slight modification using a distance accumulation stage and a post-LLR processing stage. Various architectures are accordingly developed to achieve a desired detection throughput and run-time reconfigurability by time-multiplexing of one or more component cores. The proposed core is applied also to design an optimal multiuser MIMO detector for LTE. The core occupies an area of 1.58 MGE and achieves a throughput of 733 Mbps for 256-QAM when synthesized in 90-nm CMOS.