摘要
We consider a stylized dynamic pricing model in which a monopolist prices a product to a sequence of T customers who independently make purchasing decisions based on the price offered according to a general parametric choice model. The parameters of the model are unknown to the seller, whose objective is to determine a pricing policy that minimizes the regret, which is the expected difference between the seller's revenue and the revenue of a clairvoyant seller who knows the values of the parameters in advance and always offers the revenue-maximizing price. We show that the regret of the optimal pricing policy in this model is Theta(root T), by establishing an Omega(root T) lower bound on the worst-case regret under an arbitrary policy, and presenting a pricing policy based on maximum-likelihood estimation whose regret is O(root T) across all problem instances. Furthermore, we show that when the demand curves satisfy a "well-separated" condition, the T-period regret of the optimal policy is Theta(log T). Numerical experiments show that our policies perform well.
- 出版日期2012-8