Post-training Large Language Models for Diverse High-Quality Responses