A short article appeared on McCombs Big Ideas
In the paper “The Right Music at the Right Time: Adaptive Personalized Playlists Based on Sequence Modeling “ (published at MISQ) Peter Stone, Elad Leibman and I consider online learning and adaptation to produce a playlist of songs that suit an individual listener’s preferences at a given context. Recent years have seen a growing focus on automated personalized services, with music recommendations a particularly prominent domain. While most prior work on music recommender systems has focused on preferences for songs and artists, a fundamental aspect of human music perception is that music is experienced in temporal context and in sequence. Listeners’ preferences is thus affected by the sequence in which songs are being played and the corresponding song transitions. Further, a listener’s sequential preferences may vary across circumstances, such as in response to different emotional or functional needs, so that different song sequences may be more satisfying at different times.
We develop a framework for a personalized DJ, DJ-MC monte carlo (DJ-MC), that learns and adapts both the songs and sequence in which they are played to the listeners’ preferences in real time, by interacting with the listener, sequentially, during a listening session. Specifically, our DJ-MC agent plays songs sequentially. After each song, the listener provides feedback and DJ-MC continuously adapts to the listener’s preferences by recommending the next song, with the overall goal of producing the most pleasing playlist to the listener.
This goal differs from most prior work on personalized playlist generation, which has either considered batch (e.g., supervised) learning from historical data, aimed at learning listeners’ preferences for individual songs or artists irrespective of the sequence in which songs are being played, or considered online adaptation over long periods of time, thus did not aim to suit a listener’s preferences at a given time/context. In this paper, we aimed to bridge this gap.
Specifically, we explored the design challenges of real-time learning of sequential preferences and adaptation to these preferences by interacting with the listener during a listening session, and studied whether such adaptation can offer better listening experiences for listeners. To allow learning from limited interactions with the listener (in which dj-mc plays a song and acquires the listener feedback), generalization of songs and sequential preferences from past experiences ought to be highly efficient: we identified key design properties for achieving this goal, and we empirically evaluated their contributions to our framework’s performance.
To achieve this goal, our framework includes an internal listener reward model: a mapping between the playlist played thus far and the next song to be played, onto the reward/utility that the listener is likely to derive. Throughout the listening session, the listener reward model is updated after each song based on the listener’s feedback, and the playlist is adapted after each song to better suit the listener’s preferences at the current time. The listener reward model is designed specifically to promote aggressive generalization of the listener’s preferences from limited experiences with the listener. In addition, our approach relies on a particular representation of songs and song transitions that similarly aims to facilitate generalization of preferences quickly from limited experiences with the listener.
We evaluate the framework using both real playlist datasets and an experiment with human listeners. The results demonstrate that our framework’s performance is robust in the presence of arbitrarily complex and heterogeneous individual preferences. The experiments with human listeners establish that our framework is effective at adapting online to a listener’s sequential preferences, and that it yields significantly more enjoyable sequences to listeners.
Our research also establishes that future advances of online adaptation to listener’s temporal preferences is a valuable avenue for research, and suggests that similar benefits may be possible from exploring online learning of temporal preferences for other personalized services.
In another paper “Designing Better Playlists with Monte Carlo Tree Search” (IAAI-17) we explore the planning element of DJ-MC. Recall that the DJ-MC playlist recommender system include (i) a preference-learning component: the listener reward model and the mechanism for fitting the model based on the feedback provided by the listener over time, and ii) a planning element for selecting the next song in the playlist sequence. Specifically, planning refers to DJ-MC exploration of the space of songs to play next, and assessment of the listener enjoyment from the playlist if different songs are played. The planning element of a playlist generation is a problem on which little work has been done. We proposed a tree search approach that more effectively explores the space of possible songs to play and we introduced a new variant of playlist recommendation, which incorporates considerations of diversity and novelty directly into the recommender system’s reinforcement learning reward model.