Adaptively Exploiting d-Separators with Causal Bandits
Multi-armed bandit problems provide a framework to identify the optimal intervention over a sequence of repeated experiments. Without additional assumptions, minimax optimal performance (measured by cumulative regret) is well-understood. When observed variables d-separate the intervention from the outcome, causal bandit algorithms provably incur less regret. However, an ideal algorithm should be adaptive; that is, perform nearly as well as an algorithm with oracle knowledge of whether a d-separator is observed, without requiring this knowledge. We formalize this notion of adaptivity, and provide a new algorithm that achieves (a) optimal regret when a d-separator is observed, improving on classical algorithms, and (b) significantly less regret than causal bandit algorithms when no d-separator is observed. Crucially, our algorithm does not require any oracle knowledge of the presence of a d-separator. We also generalize this adaptivity to other conditions, e.g. the front-door criterion.
Date and Time
-
Langue de la présentation orale
Anglais
Langue des supports visuels
Anglais