In this work, we address the profit maximization problem for a wireless network carrier and payment minimization for users with unknown demand profile. In particular, we address the exploration-exploitation tradeoff that faces the carrier who is unaware of users demand profiles and seeks to maximize its expected profit by applying smart pricing to incentivize users to use proactive caching. We formulate the problem as a reinforcement learning problem where the carrier minimizes the regret defined as the difference between the profit obtained by a genie who knows the demand profile of all users and that obtained by the given policy over a finite horizon D. Users, on the other hand, harness their predictable demands in proactively caching peak time demand to minimize their expected payments. We show that the carrier needs only to know the distribution of users demand profile in order to converge to the same profit of the case when it has an access to the demand profile, hence preserving users privacy. An iterative gradient-based policy is proposed to minimize the regret. Numerical results are provided where the proposed policy is compared with the UCB1 algorithm.