Unpredictable topology changes, energy constraints and link unreliability make the information transmission a challenging problem in wireless sensor networks (WSN). Taking some ideas from machine learning methods, we propose a novel geographic routing algorithm for WSN, named Q-probabilistic routing (Q-PR), that makes intelligent routing decisions from the delayed reward of previous actions and the local interaction among neighbor nodes, by using reinforcement learning and a Bayesian decision model. Moreover, by considering the message importance embedded in the message itself routing decisions can be adapted to traffic importance. Experimental results show that Q-PR becomes a routing policy that, as a function of the message importance, achieves a trade-off among the expected number of retransmissions (ETX), the successful delivery rate and the network lifetime.