Hitesh Ballani previews SIGCOMM 2015

Published August 17, 2015

Share this page

Sigcomm, (opens in new tab) the annual mecca for networking researchers, is being held in London this week (August 17-21, 2015). The conference program (opens in new tab) includes something for all tastes: perennial sessions on wide-area and wireless networks, topics du jour like data centers and software defined networking, and even blasts from the past like network algorithmics. I want to take this opportunity to point out a few things I am looking forward to.

There is a new Experience Track at Sigcomm this year with papers describing commercial systems. This should address the rumblings about insufficient industry participation at Sigcomm and whether we, as a community, are having enough real-world impact. I am very excited about the session, and about two papers in particular. First, the Jupiter Rising paper (opens in new tab) presents a brilliant retrospective of how the network in Google’s data centers has evolved over the past decade. Seeing the parallels between various generations of the Google network and concurrent proposals in the research community is fascinating. I am confident this paper will become a must read for anyone working in the area.

Second, the Pingmesh paper (opens in new tab) describes a system that continuously measures inter-server latencies in Microsoft’s data centers to perform network troubleshooting. The wide variety of network problems that can be detected using this incredibly simple technique is surprising. Actually, both these papers have a running theme of elegant and simple design coupled with impressive engineering in order to reduce cost and complexity. I believe this theme underlies most systems deployed at scale and is something that we should strive for, even in research prototypes.

The study of traffic characteristics across Facebook’s data centers (opens in new tab)is intriguing. Some observations in the paper, like the lack of rack-level locality in their traffic, may raise a few eyebrows as they run counter to similar studies from other cloud operators. What caught my eye is the very low average utilization (<10%) of their data center network, even though it is oversubscribed. This agrees with past studies and calls into question the idea of building (costly) full bisection bandwidth networks; can cloud operators afford such over-provisioning? Does this also mean that data center network scheduling is moot? Perhaps not since it is important to differentiate coarse-grained average statistics from tail behavior. The microsecond-granularity buffer utilization reported in the paper supports this argument. I have never seen such detailed statistics reported before, so kudos to the authors!

The congestion control and Quality-of-Service (QoS) sessions are also compelling, albeit for entirely different reasons. When I graduated, my advisor sent me off with the simple advice: “Stay away from congestion control research.” I am sure he meant well. Congestion control research had seemingly hit a dead-end from its heyday in the late 90s. The same held true for QoS research. Sigcomm 2003 even had an aptly named workshop, RIPQoS, to put the QoS discussion to bed. However, new environments and technologies have led to a phoenix-like resurgence for these areas. For example, using the capabilities of modern NICs and switches, techniques like Timely (opens in new tab) and DCQCN (opens in new tab) are really pushing the boundaries for congestion control in low latency and high throughput networks. Looking ahead, the emergence of Rack-scale Computing (opens in new tab)– high-density racks with disaggregated resources – means that the era of nanosecond-latency networks is upon us. Beyond the improvements in raw latency and bandwidth, the notion of a converged rack network carrying a variety of traffic between different kinds of resources (compute SoCs, storage, memory) is likely to spur the next generation of network designs.

On the SDN front, the Fibbing paper (opens in new tab) presents a very neat technique that provides centralized, programmatic control over a distributed routing protocol like OSPF. This is done by introducing fake nodes and links in the topology which, at first, seems like a clever hack. I think the authors are onto something deeper though, i.e., centralized and flexible routing may not preclude the use of (existing) distributed routing protocols. I am still mulling over the implication of this technique for general management tasks, especially with regards to the notion of decoupled data and control-plane.

As for wireless, I find the idea of backscatter communication, i.e., sending data by modulating an existing signal, very cool. The technique has the feeling of pulling something out of thin air, yet it seems to work. This year’s backscatter papers use clever designs to increase the achievable data rate (opens in new tab)and to show that backscattering can even be done atop WiFi signals (opens in new tab). Perhaps it shows my ignorance of the area but this work has a wow factor that some traditional networking topics seem to lack. Furthermore, the proliferation of tiny connected devices makes this a timely topic to investigate.

Finally, building on last year’s success, the industrial demo session (opens in new tab) presents a great opportunity to see cutting-edge products in the flesh. Researchers interested in data center technologies will find demonstrations of VMWare’s Open vSwitch, Barefoot’s Programmable Dataplanes, Azure’s Cloud Switch and Microsoft’s End-to-end storage QoS particularly interesting. The conference also includes a couple of new events to help younger students transition into our community. Students can get feedback (opens in new tab)on their research and get primers (opens in new tab) on various research areas from more seasoned researchers. I remember how daunting my first Sigcomm was, so I am sure these events will be very well received.

Overall, we are in for an exciting conference. If you are not attending (and thus missing out on lovely English weather and cuisine), I hope this post will help you tune your Sigcomm reading list.

Hitesh Ballani is a Senior Researcher at Microsoft Research. He designs and builds networked systems that strike a balance between clean-slate and dirty-slate solutions. His current research focuses on data center networks and rack-scale computing. His recent work on Predictable Data Centers led to the Storage Quality of Service feature in Windows Server. He graduated with a Ph.D. from Cornell University in 2009 where he worked on network management, Internet routing, and IP anycast.

For more computer science research news, visit ResearchNews.com (opens in new tab).

Microsoft Research Blog

MedFuzz: Exploring the robustness of LLMs on medical challenge problems