基于优化反馈的组合在线学习 (Combinatorial Online Learning Based on Optimizing feedbacks)

大数据(Big Data Research) | , Vol 7(5)

组合在线学习问题研究如何在与环境的交互过程中学习未知参数,逐步找到最优的目标组合。该问题有丰富的应用场景,如广告投放、搜索和推荐等。首先阐述了组合在线学习问题的定义及其框架——组合多臂老虎机问题,归纳了此框架下的经典算法和研究进展;然后具体介绍了该问题的两个实际应用——在线影响力最大化和在线排序学习问题,以及其研究进展;最后展望了组合在线学习问题的未来研究方向。

Combinatorial online learning studies how to learn the unknown parameters and gradually find the optimal combination of targets during the interactions with the environment. This problem has a wide range of applications including advertisement placement, searching and recommendation. Firstly, the definition of combinatorial online learning and its general framework – the problem of combinatorial multi-armed bandits were introduced, and its traditional algorithms and research progress were summarized. Then, the related works of two specific applications, online influence maximization and online learning to rank, were introduced. Finally, the prospective directions of further researches on combinatorial online learning were discussed.