Summary form only given. Recently, streaming architectures such as Imagine, Merrimac and cell were demonstrated to achieve significantly higher performance and efficiency over traditional architectures by introducing an explicitly managed on-chip storage in the memory hierarchy. This software managed memory serves as a staging area for bulk amounts of data, making all functional unit references short and predictable, while data is asynchronously transferred from external memory. The decoupling of computation from memory accesses allows the software to statically optimize the execution pipeline, transferring the onus of latency tolerance from hardware to software.