Abstract
We study the use of reinforcement learning to model Dynamic Spectrum Access in a realistic multi-channel environment. Three different approaches from the literature on the multi-armed bandit problem are compared on a set of realistic channel access models — two are based on stochastic models of the channel occupancy, while a third assumes an adversarial model.