This paper proposes a NASH Q-learning (NashQ) algorithm in a packet forwarding game in overlay noncooperative multi-agent wireless sensor networks (WSNs). The objective is to achieve the best mutual response between two agents. The results show that NashQ can obtain the best mutual response by learning online, as opposed to the offline exhaustive search in an existing non-cooperative game theoretic approach. Therefore, NashQ is more adaptive to topological changes yet less computationally demanding in the long run. Furthermore, NashQ also appears to be more robust to the non-uniqueness of Nash equilibrium as results show a consistent cooperative behavior trend when compared with the existing approach.