In reinforcement learning, there are basically two spaces to search: value-function space and policy space. Consequently, there are two fitness functions each with their associated trade-offs. However, the problem is still perceived as a single-objective one. Here a multi-objective reinforcement learning algorithm is proposed with a structured novelty map population evolving feedforward neural models. It outperforms a gradient based continuous input-output state-of-art algorithm in two problems. Contrary to the gradient based algorithm, the proposed one solves both problems with the same parameters and smaller variance of results. Moreover, the results are comparable even with other discrete action algorithms of the literature as well as neuroevolution methods such as NEAT. The proposed method brings also the novelty map population concept, i.e., a novelty map-based population which is less sensitive to the input distribution and therefore more suitable to create the state space. In fact, the novelty map framework is shown to be less dynamic and more resource efficient than variants of the self-organizing map.