In an electronic warfare-type scenario, an optimal jamming strategy is vital important for a jammer who has restricted power and how to make the optimal strategies quickly and accurately put on the agenda. In this paper, we developed a cognitive jammer who could learn the optimal jamming strategies with the proposed algorithm-Greedy Bandits (GB). By interacting with transmitter-receiver pairs continually, which is also the advantage of reinforcement learning theory, the jammer obtains the optimal physical layer parameters like signaling scheme, power level and the on-off/pulsing. After constructing the jamming model, we first prove that the proposed Greedy Bandits algorithm satisfied the jamming needs, then two new reward standard-changes in power and enduring time are also presented. Numerous results show that GB convergences more quickly than other reinforcement learning algorithm such as Jamming Bandits (JB). More importantly, GB with two proposed reward standards has an acceptable learning performance and a wide utilizing field than learning with symbol error rate (SER), despite that more interaction times is needed.