Real-World RL process graphic

Real World Reinforcement Learning

Established: May 3, 2019

Blue MSN logo on a transparent background

Goal: To provide MSN personalization for their news articles.

Approach: The Real-World RL platform was deployed inside of MSN’s infrastructure to enable a very rapid personalization rate across their world-wide deployments.

Results: The RL-based based personalization at MSN provides, on average, a 26% Click Through Rate (CTR) improvement.



COMPLEX.com logo on black background

Goal: To provide complex.com, an external client, a means to personalize various areas of their website.

Approach: Complex.com used the web interface (rest-API) of the Real-World RL platform to personalize their Top News articles, videos, and suggested articles.

Results: The Real-World RL platform ran for over two years and provided complex.com, on average, a 30% CTR improvement over their baseline (editor’s suggested rank).


Xbox 2016 stacked RGB logo on transparent background

Goal: To provide the Microsoft marketing team “Top Of Home” personalized ad campaigns.

Approach: Microsoft’s internal marketing campaign manager (IRIS) used the Real-World RL platform to personalize two of the three Xbox “Top of Home” slots.

Results: The pilot had two phases: 1) The Microsoft Research RL team ran counterfactual evaluation to estimate user’s engagement based on real-world data collected for two weeks in June 2018. 2) The Real-World RL system was deployed in production for two weeks in November 2018, resulting in a 60% CTR improvement over a baseline random policy and increased user’s engagement metrics.


Microsoft Surface stacked logo with symbol

Goal: To provide the marketing team (MLEDCOP) the ability to perform website layout personalization. The pilot specifically targeted the Surface.com page layout for Japan.

Approach: The Real-World RL platform was used to personalize different calls-to-action in three different webpages on the Surface.com Japan website. The pilot was run in an A/B fashion, where the control used the original layout as provided by Design, and the treatment used the Real-World RL platform to personalize the layout based on the users accessing the website.

Results: The RL-based based personalization provided an 80% CTR improvement over the control.


Skype logo

Goal: To provide Skype a means to optimize the length of their jitter buffer on a per-call basis in order to provide the best call quality possible to their end users.

Approach: The Skype team ran the Real-World RL platform on a subset of their call agents for a few weeks.

Results: When comparing results on their “treatment” traffic, the Skype team saw a 1.5% improvement on the Poor Call Quality Metric, a metric that is typically used to proxy how users felt about the quality of the call.


Black and white map graphic with green location arrows

Goal: To provide the AFD Frontier team a means to optimize the tcp/ip setting of their clusters to provide the best server configuration.

Approach: The AFD Frontier team used the Real-World RL platform for a 3-month pilot as part of the 2017 AI School.

Results: The Real-World RL system provided considerable lift over default behavior. The AI School project won “Best project award,” and it is now used as the basis for an extended pilot between AFD and Microsoft Research.