In this paper, we target field-programmable gate array (FPGA) performance optimization using a novel binary decision diagram (BDD)-based synthesis paradigm. Most previous works have focused on BDD size reduction during logic synthesis. In this paper, we concentrate on delay reduction and conclude that there is a large optimization margin through BDD synthesis for FPGA performance optimization. Our contributions are threefold: 1) we propose a gain-based clustering and partial collapsing algorithm to prepare the initial design for BDD synthesis for better delay; 2) we use a technique called linear expansion for BDD decomposition, which, in turn, enables a dynamic programming algorithm to efficiently search through the optimization space for the BDD of each node in the clustered circuit; and 3) we consider special decomposition scenarios coupled with linear expansion for further improvement on the quality of results. Experimental results show that we can achieve a 30% performance gain with a 22% area overhead on the average compared to a previous state-of-the-art BDD-based FPGA synthesis tool, namely, BDS-pga. Compared to DAOmap, we can achieve a 33% performance gain with only an 8% area overhead. Compared to the ABC mapper, we can achieve a 20% performance gain with only an 8% area overhead.