In this paper, we target FPGA performance optimization using a novel BDD (binary decision graph)-based synthesis approach. Most of previous works have focused on BDD size reduction during logic synthesis. In this work, we concentrate on delay reduction and conclude that there is a large optimization margin through BDD synthesis for FPGA performance optimization. Our contributions are threefold: (1) we propose a gain-based clustering and partial collapsing algorithm to prepare the initial design for BDD synthesis for better delay; (2) we use a technique named linear expansion for BDD decomposition, which in turn enables a dynamic programming algorithm to efficiently search through the optimization space for the BDD of each node in the clustered circuit; (3) we consider special decomposition scenarios coupled with linear expansion for further improvement on quality of results. Experimental results show that we can achieve a 95% gain in terms of network depths, and a 20% gain in terms of routed delay, with a 22% area overhead on average compared to a previous state-of-art BDD-based FPGA synthesis tool, BDS-pga.