There is a growing interest in designing high-speed network devices to perform packet processing at stream layer. However, TCP processing for 10G backbone traffic is not just to address performance problem but also to cope with abnormal conditions. Some characteristics of real traffic, especially the lack of finish tag for many streams and the complexity of packets reordering, will result in memory exhaustion for hardware-based TCP subsystem which is less flexible for exceptional processing. In this paper, we present a hardware design for backbone traffic which is capable of processing 10G with TCP reassembly and tracking states of millions of parallel TCP streams. The solution has several features: (1) an effective, easy hardware implementation stream replacement algorithm for massive stream table (2) fast one round access to global stream table which enable 10MPPS processing (3) an active release policy for out-of-order data buffers management (4) a design of linkless data structure which ensures time limit for worst case processing. The simulation result shows that the system can process over 99% of the 10G Backbone traffic using reasonable storage resources. A FPGA-based prototype is also implemented for evaluation.