Skip to main content
Adaptively Exploiting d-Separators with Causal Bandits
Multi-armed bandit problems provide a framework to identify the optimal intervention over a sequence of repeated experiments. Without additional assumptions, minimax optimal performance (measured by cumulative regret) is well-understood. When observed variables d-separate the intervention from the outcome, causal bandit algorithms provably incur less regret. However, an ideal algorithm should be adaptive; that is, perform nearly as well as an algorithm with oracle knowledge of whether a d-separator is observed, without requiring this knowledge. We formalize this notion of adaptivity, and provide a new algorithm that achieves (a) optimal regret when a d-separator is observed, improving on classical algorithms, and (b) significantly less regret than causal bandit algorithms when no d-separator is observed. Crucially, our algorithm does not require any oracle knowledge of the presence of a d-separator. We also generalize this adaptivity to other conditions, e.g. the front-door criterion.
Date and Time
-
Additional Authors and Speakers (not including you)
Linbo Wang
University of Toronto
Daniel M. Roy
University of Toronto
Language of Oral Presentation
English
Language of Visual Aids
English

Speaker

Edit Name Primary Affiliation
Blair Bilodeau University of Toronto