Regret Balancing for Bandit and RL Model Selection