“A Reinforcement Learning Approach to Autonomous Decision-Making in Smart Electricity Markets" (Machine Learning, 2013)
Backround on Reinforcement Learning
Most research in machine learning considers the problem of supervised learning: learning offline from (possibly large) historical data, where the actions or decision informed by the learned knowledge take place at a subsequent phased. Yet, in many environments one must act and learn from the consequences of one’s actions, simultaneously, with the goal of maximizing some long-term reward over time. At the outset it may not be known what actions lead to, or what situations are good. For example, to learn what online auction parameters yield best results for an auctioneer facing an unknown population of bidders, one can iteratively adapt to the market via a sequence of auction, in which the auction parameters are altered sequentially to explore the outcomes of different alternatives. To maximize the reward accumulated over time, the learning agent’s actions must also balance exploration with exploitation of what is already known, so as to produce higher reward over the complete sequence. This challenge corresponds to the general problem of Reinforcement Learning. Together with colleagues from the Computer Science department at UT Austin and from Erasmus University Rotterdam, we studied several problems motivated by challenges in smart electricity markets and in music recommendations.
An Electricity Broker for the Smart Grid
The “Smart Grid” aims to integrate the actions of multiple stakeholders to achieve sustainable and secure electricity supplies. The inclusion of renewable electricity sources, which are intermittent and variable, makes the challenge of maintaining the balance between electricity demand and supply particularly challenging. Our paper “A Reinforcement Learning Approach to Autonomous Decision-Making in Smart Electricity Markets" (Machine Learning, 2013), considers electricity brokers -- intermediaries between retail customers and large-scale producers of electricity – to facilitate the real-time balance between supply and demand. Electricity brokers in Smart Grids aim to serve as information aggregators that fulfill risk pooling and management functions, to help attain socially-desirable market outcomes. Consequently, they must trade in multiple, interrelated markets simultaneously (often referred to as “Smart Markets”). Because there is considerable uncertainty about the structure of future Smart Electricity Market, and because of the dynamic nature of smart markets, our goal is to design of autonomous electricity broker agent that can accommodate a wide variety of market structures and conditions. We developed and evaluated a class of autonomous electricity brokers for retail electricity trading that have the flexibility needed to operate effectively in a wide range of market structures, and that are capable of deriving long-term, profit-maximizing policies. More generally, research on autonomous electricity brokers for the Smart Grid constitutes a nascent, emerging field. Therefore, important objectives of this work are to identify and study key design elements that allow broker agents to operate effectively in the Smart Grid, and to inform future work of challenges and promising approaches to accommodate renewable sources into our electricity grids.The brokers we developed use Reinforcement Learning with function approximation, allowing them to “learn” mappings between different market conditions to advantageous trading decisions; they can also accommodate a very rich set of economic signals from their environments, and they learn efficiently over the large state spaces resulting from these signals. Previous approaches are limited in the state space size they can accommodate, and are therefore constrained in terms of the environments they can be deployed into. Another contribution of this work is our study of the role of feature selection and regularization techniques play in optimizing electricity brokers for the data-rich Smart Electricity Market environments we consider. We explore the benefits of two different feature selection procedures based on Genetic Algorithms and greedy feature augmentation, and we compare these procedures to the L1-regularized online learning techniques over the full state space. We find that the inexpensive regularization approach yields satisfactory results under some market conditions, while the more extensive feature selection techniques can be extremely effective if proper precautions are taken against overfitting. We also provide guidance on how such overfitting can be alleviated.
“A Scalable Preference Model for Autonomous Decision-Making” (Machine Learning, 2018)
In this paper we facilitate agents automated decisions in dynamic smart markets by extending their data-driven modeling capabilities. A key challenge in autonomous decision-making in unstructured settings is the identification of what choices a given user deems best. In smart grids, autonomous agents are anticipated to play a key role in facilitating efficient electricity distribution and use. Particular challenges in this context are electric vehicles that are charged in varying locations, and the incorporation of intermittent and variable renewable electricity sources, such as solar and wind. Data-driven modeling of electricity consumption preferences is essential for predicting consumption patterns for planning and for effectively incentivizing consumers to choose sustainable behaviors (Peters et. al, 2013). A preference model can learn and predict that a user is unlikely to use her electric vehicle in the afternoon and offer the owner personalized incentives to make the battery’s energy available to nearby consumers when renewable energy is scarce. Electricity cost and emission reduction, informed by data-driven preference learning and autonomous decision-making, can be significant (Kahlen et al, 2014). Recent, non-parametric Bayesian models are particularly promising towards modeling such preferences because they can adapt in a data-driven fashion to the complexity of real-world observations, and because they accommodate inconsistencies in human choices rather than impose stringent rationality assumptions. By allowing inconsistencies in observed choices to translated to uncertainty, these models can inform choices between autonomous action when estimates are certain enough, and instances where the model might benefit from actively acquiring additional evidence or transfer control to a human decision-maker. However, for such models to be widely adopted in practice, progress is necessary towards methods that scale well and that are conceptually simple to adapt to different settings. Important domains such as energy markets and healthcare require methods that are computationally efficient, and that scale gracefully to a potentially very large number of users and observations. Contemporary electric distribution systems, for example, produce large amounts of data from up to ten million consumer meters, each transmitting data every few minutes. Such large amounts of data must be processed quickly and at high granularity (i.e., unaggregated), as automated responses often rely on fine-grained, local information. It is therefore important for preference models to provide consistently fast training times, as well as to incorporate and act on new data in a timely manner. Yet, existing methods producing uncertainly estimates and state-of-the-art predictive accuracies do not scale well to a large number of users. Their prohibitive computational costs cannot be addressed with additional processing power or offline processing, making such methods impractical for modeling a large number of users. By contrast, scalable methods often yield significantly worse predictive accuracy or do not produce uncertainty estimates for subsequent decision-making.
In this paper we develop and evaluate a novel, non-parametric, Bayesian approach that extends the existing preference modeling tools set with significant practical implications. Our approach, The Gaussian process scalable preference model via Kronecker factorization (GaSPK), leverages common features of consumer choice settings, particularly the small set of relevant product attributes, to yield state-of-the-art scalability. GTM leverages choice setting characteristics to introduce a novel use of the Kronecker covariance matrices for preference modeling – to our knowledge, no prior work had employed the favorable factorization and decomposition properties of Kronecker matrices towards covariance matrices for preference learning. We perform extensive empirical evaluations that demonstrate that this feature yields meaningful practical benefits. We empirically evaluate GaSPK’s performance relative to that of key benchmarks on three real-world consumer choice datasets. For this study we collected an electricity tariff choice dataset on a commercial crowdsourcing platform for a U.S. retail electricity market. To confirm our findings we evaluated the methods on two benchmark choice datasets on political elections and car purchases. Our results establish that GaSPK is often the method of choice for modeling preferences of a large number of users. GaSPK produces state-of-the-art scalability and conceptual simplicity, while often yielding favorable predictive accuracy as compared to the accuracy achieved by existing approaches. Given its performance, GaSPK offers a new benchmark to the preference modeling toolset that is particularly suitable for modeling a large number of users’ preferences when choice alternatives can be described by a small number of relevant attributes. Finally, GaSPK’s principled handling of uncertainty is instrumental for autonomous decision-making, and its conceptual simplicity facilitates adaptation of the approach by practitioner to new domains.