Estimating Optimal Policy Value in General Linear Contextual Bandits

Publication
Transactions on Machine Learning Research