Optimum soft-output (SO) multiple-input multiple- output (MIMO) tree-search detection algorithms pose significant implementation challenges due to their nondeterministic processing throughput and high computational complexity. In this two-part work, we present extensive algorithmic and architectural optimizations of the sphere-decoding algorithm targeted at achieving practical tradeoffs between desired link performance and affordable computational complexity. The algorithmic optimizations in this part span the tree-search traversal scheme, leaf processing step, internal node-pruning and skipping step, child enumeration based on a state-machine, adaptive radius scaling for LLR clipping, QR-decomposition based on minimum cumulative residuals, and multitree configurations. The optimizations demonstrate that a 64-QAM SO MIMO detector for LTE is capable of attaining almost ML performance with an SNR loss of only 0.85 dB at 1% BLER by visiting at most 200 tree nodes.