Auto-bidding for second-price auctions via Oracle Imitation Learning (OIL)

Dec 14, 2024·
Alberto Chiappa
Alberto Chiappa
· 0 min read
Image credit: Alibaba
Abstract
Online advertising has become one of the most successful business models of the internet era. Typically, advertisement slots are allocated through second-price auctions, where the top-N bidding advertisers secure the top-N slots and pay fees based on the next highest bids. The General Track of the 2024 NeurIPS competition “Auto-Bidding in Uncertain Environment” asked the participants to develop an automatic bidding algorithm for multi-slot second price auctions. The solution we propose consists of a teacher-student algorithm, where the teacher, which we call “oracle”, has access to the complete information about the whole advertisement campaign, i.e., perfect knowledge about the future and past impression opportunities and bids. With full observability of the campaign information, finding the optimal bids can be cast as a multi-choice knapsack problem (MCKP) with a nonlinear objective function. By employing a heuristic greedy approach, the oracle algorithm can efficiently compute a near-optimal combination of advertisement slots to maximize the number of conversions, adhering to a budget and an efficiency constraint. We empirically validate that the performance of the oracle far exceeds that of bidding agents trained with offline or online reinforcement learning. This means that the bids output by the oracle can serve as a strong supervision signal for a student network. Therefore, we use behavior cloning to train a student network to imitate the bids output by the oracle policy, using exclusively the information available at the time of the decision. We call this method Oracle Imitation Learning (OIL). Despite the simplicity of the imitation approach, the auto-bidding agent trained with OIL outperforms both online and offline reinforcement learning algorithms in terms of sample efficiency, training stability, and final performance. Importantly, OIL shifts the complexity of training an auto-bidding agent from designing a sophisticated learning algorithm to enhancing the oracle’s performance. Using OIL, we ranked 1st in the official phase and 6th in the final phase of the NeurIPS 2024 Auto-bidding Challenge, competing against over 100 teams.
Date
Dec 14, 2024 3:30 PM — 3:45 PM
Event
Location

Vancouver - Zoom

Alberto Chiappa
Authors
PhD Student