The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
An online learning problem with side information is considered. The problem is formulated as a graph structured stochastic Multi-Armed Bandit (MAB). Each node in the graph represents an arm in the bandit problem and an edge between two arms indicates closeness in their mean rewards. It is shown that such side information induces a Unit Interval Graph and several graph properties can be leveraged to...
The problem of online learning of consumer response to retail pricing of electricity in a distribution network is considered. In a two-settlement market, the retailer who sets the retail price is exposed to risks from the stochastic response of its consumers and the real-time price fluctuation in the wholesale market. The optimal price maximizing the expected profit is a function of consumer's response...
The problem of online learning and optimization of unknown Markov jump linear models is considered. A new online learning algorithm, referred to as Markovian simultaneous perturbations stochastic approximation (MSPSA), is proposed. It is shown that MSPSA achieves the minimax regret order of B(vT). Using the Van Trees inequality (stochastic Cramer-Rao bound), it is shown that B(vT) is the lowest regret...
The problem of dynamically pricing of electricity by a retailer for customers in a demand response program is considered. It is assumed that the retailer obtains electricity in a two-settlement wholesale market consisting of a day ahead market and a real-time market. Under a day ahead dynamic pricing mechanism, the retailer aims to learn the aggregated demand function of its customers while maximizing...
We consider the restless multiarmed bandit problem with unknown dynamics in which a player chooses one out of arms to play at each time. The reward state of each arm transits according to an unknown Markovian rule when it is played and evolves according to an arbitrary unknown random process when it is passive. The performance of an arm selection policy is measured by regret, defined as the reward...
An online learning problem under stochastic time-varying models is considered. The problem is treated as a generalization of the classic multi-armed bandit problem when the arm distributions are time-varying. The objective is to study the impact of time variation in arm distributions on the performance of the player's strategy. Sufficient conditions on the rate of model variations under which learning...
Multi-channel opportunistic spectrum access in unslotted primary systems is considered. The primary occupancy of each channel is modeled as a general on-off renewal process. The distributions of the busy and idle times and the utilization factors of all channels are unknown to the secondary user. The objective of the secondary user is to identify and exploit the best channel (i.e., the channel with...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.