Zeta: Scheduling Interactive Services with Partial Execution

Published

Posted by Yuxiong He, Sameh Elnikety, and James Larus, Microsoft Research; and Chenyu Yan, Microsoft

SoCC logo

Sharing a resource, such as a computer processor or disk drive, requires the system to make decisions about which user’s task gets to use the resource and for how long. When my task is running, yours may be idling in a queue, awaiting its turn on a processor. The process of making these resource-allocation decisions is known as scheduling, and it has long been one of the most important aspects of system design, as a poor scheduler not only wastes resources but also can ruin a user’s experience.

Microsoft research podcast

Collaborators: Silica in space with Richard Black and Dexter Greene

College freshman Dexter Greene and Microsoft research manager Richard Black discuss how technology that stores data in glass is supporting students as they expand earlier efforts to communicate what it means to be human to extraterrestrials.

Scheduling is a well-studied area, but cloud computing raises new challenges. The focus of our paper Zeta: Scheduling Interactive Services with Partial Execution (opens in new tab), being presented this week in San Jose, Calif., during the Association for Computing Machinery’s Symposium on Cloud Computing, is an important type of cloud service: time-bounded interactive services. These are websites—such as Microsoft’s Bing (opens in new tab) and Office 365 (opens in new tab) or Facebook—in which a user interacts directly with a service running in the cloud, as opposed to an application running on a local machine. To provide a pleasing user experience, the service’s response to a user’s click must occur near instantaneously, typically within a couple of hundred milliseconds, around the limits of human perception.

Building a cloud service that can provide this quality of service to millions of simultaneous users is a vast engineering challenge. Our work focuses on one small, but important aspect of this problem, determining which task to run on a computer and for how long. Our new scheduling algorithm, Zeta (opens in new tab), is far more effective than the usual algorithm (first come, first serve). In experiments using Bing, Zeta enabled the search engine to handle 29 percent more queries without degrading quality, potentially reducing the number of servers by 22 percent.

Zeta schedules tasks that have three important characteristics:

  1. Deadline. The task must run within a fixed amount of time, to ensure a high-quality user experience.
  2. Partial execution. The task can produce a partial result at any point in its execution, so that if the deadline expires, the system can return the results computed to that point.
  3. Concave quality profile. A quality profile maps the time a task runs to the quality of its result. A concave quality profile indicates a diminished marginal return in quality as the task runs for longer periods.

Many different services share these characteristics. For example, a web search engine such as Bing accepts a query and returns a collection of webpages within a few hundred milliseconds. The process of matching and selecting the webpages is designed to return the best pages found so far, and the data structures are organized to place the highest-quality pages where they will be examined first. As a result, the quality profile of Bing looks like this:

Normalized Processing Time

Zeta’s key insights are that long-running tasks cannot be allowed to starve short ones and that terminating a long task early and giving its resources to a new task improves the overall service quality, because the concave quality profile returns a larger benefit at the start of a task than late in its execution. Zeta has two policies:

  1. Equi-partitioning. Under a heavy load, Zeta evenly divides the CPU among the tasks waiting to run.
  2. Reservation. Under a light load, equi-partitioning can waste resources by allocating a short task the same amount of resources as a long task, even though the short task will finish early and the long task will be terminated prematurely. This policy reserves CPU time based on the average service demand for the waiting tasks and assigns the remaining time to the task that is ready to run.

These two policies can be implemented simply and efficiently. The changes to the Bing search engine were less than 100 lines and added no perceivable overhead. This small change greatly improved the search engine’s performance:

Queries per second

This chart shows the quality of search results, relative to time-unbounded search results, of a standard workload running on a test cluster of 800 machines. At a given quality level, say 0.99, Zeta delivers a significantly higher throughput: 610 QPS versus 475 for the first-come, first-serve algorithm. Alternatively, at a given workload, Zeta delivers higher-quality search results. Moreover, Zeta reduces the variance in search quality, so that more customers get the best possible search results, even under a heavy load.

This is only a preview of the paper. Zeta is effective not only for search engines; it also works well for tasks whose quality profiles are not strictly concave. The paper contains much more detail.