July 12, 2010 July 13, 2010

Faculty Summit 2010

Location: Redmond, WA, USA

Monday, July 12

TimeEvent/TopicLocation
8:00–9:00
BreakfastHood
9:00–10:30
Opening Plenary SessionsKodiak
9:00–9:30
Welcome and Introduction | slides (opens in new tab)

Tony Hey, Corporate Vice President, Microsoft Research

 
9:30–10:30
Kinect for Xbox 360—The Innovation Journey | slides (opens in new tab)
Andrew Fitzgibbon, Principal Researcher, Microsoft Research, Cambridge; Kudo Tsunoda, Creative Director – Kinect, Microsoft

Kinect brings games and entertainment to life in extraordinary new ways—no controller required. Easy to use and instantly fun, Kinect (formerly known as “Project Natal”) gets everyone off the couch. Want to join a friend in the fun? Simply jump in. And the best part is Kinect works with every Xbox 360. When technology becomes invisible and intuitive something special happens—you and your experience become one. No barriers, no boundaries, no gadgets, no gizmos, no learning curves. With Kinect you are the controller. It’s just the magic of you—your movement, your voice, your face, all effortlessly, naturally and beautifully transforming how you play and experience entertainment. This session introduced Kinect and explained the technology behind it.
 
10:30–10:45
Break 
10:45–12:15
Breakout Sessions 
 
Natural User Interaction
Visualization and Interaction Today—Selected Perspectives

Session Chair: Mary Czerwinski, Microsoft Research

Rob Deline, Microsoft Research
Steven Drucker, Microsoft Research
Danyel Fisher, Microsoft Research | slides
(opens in new tab)
Jeffrey Heer, Stanford University
George Robertson, Microsoft Research | slides (opens in new tab)

Information visualization lets users make sense of data visually, and applies across fields and areas. We have invited five researchers to discuss current work in information visualization: Rob DeLine (MSR) discussed applying visualization to source code, and Danyel Fisher (MSR) discussed visualization in a NUI context. George Robertson (MSR) gave an overview of how animation and data visualization can work together. Jeff Heer (Stanford) discussed the Protovis toolkit, and Steven Drucker (MSR) discussed the WebCharts toolkit. Together, these talks gave an overview of the newest work in visualization, the broad applicability of its techniques, and provided starting points for researchers and practitioner who might apply visualization to their own projects.

Cascade
 
Data–Driven Software Engineering
Software Ecosystems: A New Research Agenda
Session Chair: Judith Bishop, Microsoft Research

Anthony Finkelstein, University College London | slides
(opens in new tab)
Fred Wurden, Microsoft | slides (opens in new tab)

The software development scene is transforming from unitary system, through component market places to supply chains, and now increasingly complex ecosystems of interoperating, systems, services, and environments held together by networks of partnerships and commercial relationships. Anthony Finkelstein set out a research agenda for work in this new setting. In particular, it calls for empirical research and suggests some ways in which it can be conducted. Some early data is discussed. Fred Wurden followed with an overview of the recent efforts of more than 500 engineers at Microsoft aimed directly at increasing the interoperability of Windows with open source and commercial software ecosystems.

Rainier
 
Future Web: Intelligence, Ubiquity, and Trust
Bing Dialog Model: Intent, Knowledge, and User Interaction | slides (opens in new tab)
Session Chair: Evelyne Viegas, Microsoft Research
Yu-Ting Kuo, Microsoft; Harry Shum, Microsoft; Kuansan Wang, Microsoft Research

With Internet users growing ever more sophisticated, the decade-old search outcomes, manifested in the “ten blue links,” are no longer sufficient. Many studies have shown that when users are ushered off the conventional search result pages, their needs are often only partially met. To tackle this challenge, we optimize Bing, Microsoft’s decision engine, to not just navigate users to a landing page through a blue link but to continue engaging with users to facilitate task completion. Underlying this new paradigm is the Bing Dialog Model that consists of three building blocks: an indexing system that systematically and comprehensively harvests task knowledge from the web, an intent model that statistically infers and matches users’ needs to the task knowledge, and an interaction model that elicits user intents through mathematically optimized presentations. In this talk, I’ll describe Bing Dialog Model in details and demonstrate it in action through the innovative features recently introduced in Bing.
St. Helens
 
The Challenge of Large Data
Environmental Data Management
Deb Agarwal, University of California, Berkeley | slides (opens in new tab)

Environmental scientists have been building rich networks of measurement sites that span a wide range of ecosystems and environmental conditions. Each measurement site is put in place by a science team to pursue specific science goals. These science teams now also work together to contribute their data to national and international research networks. This data, once brought together, has the potential to enable studies of spatial and temporal scales that are not possible at a single site. It also has the potential to allow researchers to discern large-scale patterns and disturbances in the combined data. The challenge in bringing together environmental data into a common data set for researchers to use is one of heterogeneity not scale. Informatics is critical to managing, curating, and archiving these data for the future, making it accessible in a form that it can be used and interpreted accurately, and producing answers to questions from a community of researchers, policy makers, and educators. Some of the networks addressing this challenge include the Long Term Ecological Research Network, the FLUXNET network, and the National Soil Carbon Network. This talk explored the challenges involved in managing environmental data and in developing informatics infrastructure to enable researchers to easily access and use regional- and global-scale data to address large-scale questions such as climate change and publishing the analysis results.

William Michener, University of New Mexico | slides (opens in new tab)

Large data do not currently present a central challenge for the environmental sciences. Instead, the big challenges lie in discovering data, dealing with extreme data heterogeneity, and converting data to information and knowledge. Addressing these challenges requires new approaches for managing, preserving, analyzing, and sharing data. In this talk, I first introduce DataONE (Data Observation Network for Earth), which represents a new virtual organization that will enable new science and knowledge creation through universal access to data about life on earth and the environment that sustains it. DataONE is poised to be the foundation of innovative environmental science through a distributed framework and sustainable cyberinfrastructure that meets the needs of science and society for open, persistent, robust, and secure access to well-described and easily discovered Earth observational data. Second, I briefly summarize several cyberinfrastructure (CI) challenges related to metadata creation, data provenance, and scientific and visualization workflows that impede science. Finally, I relate these and other CI challenges to three specific case studies: (1) understanding the world’s biodiversity, (2) conserving elephants in Africa, and (3) assessing environmental risk.
Baker
 
Special Topics
Cutting Edge Education Update
Jesse Schell, Carnegie Mellon University | slides
(opens in new tab)
Andrew Phelps, Rochester Institute of Technology | slides (opens in new tab)

Jesse Schell talked about exciting trends in both technology and culture that define the 21st century: things are becoming more beautiful, more customized, more shared, and more authentic. But is the same true for education? Jesse shows examples of how it is beginning to happen (such as achievement systems, connecting with Andy’s talk), but that there is a long way to go, and to fully realize this vision, some significant technology driven educational revolution is necessary.

Andrew Phelps discussed curricular trends in the Department of Interactive Games & Media at RIT, and in particular a set of initiatives that the department is planning around achievement systems, social networks, and student culture. Game design offers us lessons in the success of such systems in a certain context, but also comes replete with dramatic failures, relevant warnings, and a few emerging best practices. Can these systems be utilized in an educational setting, and even if they can, should they? Could these tools be utilized towards goals of curricular customization and student engagement? This portion of the talk focused on the preparations, planning, and thoughts of the IGM faculty as they begin to establish a research agenda in this space—what can we hope to learn?

After these brief presentations, Jesse and Andy engaged the audience in further discussion about how technology, design, and cultural changes can best be applied to the future of education.
Lassen
12:15–1:15
Lunch
Hood
12:15–1:15Lunchtime Sessions 
 
Design Mind + Engineering Mind: Secrets to Designing Compelling Product Experiences | slides (opens in new tab)
Surya Vanka, Microsoft

Today, the nature of products is changing fast. Most products from phones to appliances to automobiles are a combination of software and services encased in hardware. These products are novel, dynamic, and content-laden. Their interaction often spans multiple platforms (hardware, application, web), multiple form factors (desktop, mobile, television) and multiple interfaces (keyboard, pointer, voice, touch, gesture). How do you make sure that it is real human needs and not the multitude of technologies that shape the experience of these products? In this talk, Surya Vanka described how breakthrough product experiences are created at Microsoft by employing a combination of the design mind and the engineering mind. The design mind’s ability to leapfrog established patterns and paradigms, and the engineering mind’s ability to optimize and actualize, are the foundations of great individual and team processes. Surya shared principles, practices, collaborations, and thoughts on organizational culture.
Rainier
 
Memento: Time Travel for the Web | slides (opens in new tab)
Michael L. Nelson, Old Dominion University

The web is ephemeral. Many resources have representations that change over time, and many of those representations are lost forever. A lucky few manage to reappear as archived resources that carry their own URIs. For example, some content management systems maintain version pages that reflect a frozen prior state of their changing resources. Archives recurrently crawl the web to obtain the actual representation of resources, and subsequently make those available via special-purpose archived resources. In both cases, the archival copies have URIs that are protocol-wise disconnected from the URI of the resource of which they represent a prior state. Indeed, the lack of temporal capabilities in HTTP prevents getting to an archived resource on the basis of the URI of its original. This turns accessing archived resources into a significant discovery challenge for both human and software agents, which typically involves following a multitude of links from the original to the archival resource, or of searching archives for the original URI. The Memento solution is based on existing HTTP capabilities applied in a novel way to add the temporal dimension. The result is an inter-archive framework in which archived resources can seamlessly be reached via their original URI: protocol-based time travel for the web.
Lassen
1:15–2:45
Breakout Sessions
 
 Natural User Interaction
Beneath the Surface—Introduction | slides (opens in new tab)
Daniel Wigdor, Microsoft

The notion of a “Natural” user interface is a well-defined design goal, which can be targeted, built towards, tested, and used to evaluate to enable iteration. In this presentation, I describe some of the misunderstandings of this goal, with a particular emphasis on the notion that it is a means, rather than a goal for design. I also describe some of the work done on the Surface team to better define it and utilize it as a tool to achieve good design.

Beneath the Surface Projects | slides (opens in new tab)
Mark Bolas, University of Southern California; Steve Feiner, Columbia University

Professor Feiner presented recent work performed by the Computer Graphics and User Interfaces Lab at Columbia University for the Beneath the Surface project sponsored by Microsoft Research.

Professor Bolas presented recent work performed by the USC Interactive Media Division for the Beneath the Surface project, with focus toward Creative Production Environments, sponsored by Microsoft Research.
Cascade
 
Data-Driven Software Engineering
Code Contracts and Pex: Infrastructure for Dynamic and Static Analysis for .NET | slides (opens in new tab)
Session Chair: Tom Ball, Microsoft Research
Mike Barnett, Microsoft Research; Christoph Csallner, University of Texas at Arlington; Peli de Halleux, Microsoft Research

We present two complementary platforms for teaching and research involving the static and dynamic analysis of .NET programs. Code Contracts (opens in new tab) is a platform that provides a standardized format for expressing program specifications. Tools using the CCI (opens in new tab) infrastructure utilize the contracts for performing runtime verification, static analysis, and documentation generation. Pex (opens in new tab) is a platform for dynamic symbolic execution. On top of Pex, tools have been created that do advanced test case generation, reverse engineering (opens in new tab), data structure repair (opens in new tab), and database testing (opens in new tab). Both platforms are extensible and can be leveraged by researchers to build their own tools.
Rainier
 
Future Web: Intelligence, Ubiquity, and Trust
Privacy and Trust in Future Web | slides (opens in new tab)
Session Chair: Evelyne Viegas, Microsoft Research

This session describes work on providing privacy guarantees for dynamically changing data and addresses how to deliver end-to-end trust.

Privacy of Dynamic Data: Continual Observation and Pan Privacy | slides (opens in new tab)
Moni Naor, Weizmann Institute of Science

Research in the area of privacy of data analysis has been flourishing recently, with the rigorous notion of differential privacy defining the desired level of privacy as well as sanitizing algorithms matching the definition for many problems. Most of the work in the area assumes that the data to be sanitized is fixed. However, many applications of data analysis involve computations of changing data, either because the entire goal is one of monitoring, e.g., of traffic conditions, search trends, or incidence of influenza, or because the goal is some kind of adaptive optimization, e.g., placement of data to minimize access costs. Issues that arise when providing guarantees for dynamically changing data include:
  • How to provide privacy even when the algorithm has to constantly output the current value of some function of the data (Continual Observation).
  • How to assure privacy even when the internal state of the sanitizer may be leaked. This is called Pan Privacy. We aim to design algorithms that never store sensitive information about individuals, so in particular collectors of confidential data cannot be pressured to permit data to be used for purposes other than that for which they were collected.

(Based on joint papers with Cynthia Dwork, Toni Pitassi, Guy Rothblum, and Sergey Yekhanin.)

Delivering End to End Trust: Challenges, Approaches, Opportunities | slides (opens in new tab)
Jeffrey Friedberg, Microsoft

As people, businesses, and governments connect online, new valuable targets are created spawning greater cybercrime. How can we reduce the risk and increase accountability while preserving other values we cherish such as personal freedoms and anonymity? What building blocks are needed? What are the biggest gaps? The latest efforts to address this challenge was discussed. Research into improving the usability of privacy and security features was also presented.
St. Helens
 
The Challenge of Large Data
Dataset Citation, Curation, and Management

Merce Crosas, Harvard University
Liz Lyon, UKOLN/University of Bath | slides
(opens in new tab)
John Wilbanks, Science Commons | slides (opens in new tab)

The Dataverse Network | slides
(opens in new tab)
Merce Crosas, Harvard University

The Dataverse Network is an open-source web application which offers a free and flexible framework for dataset citation, curation and management. This talk presents a series of examples showing how an individual dataverse can be used by researchers, journals, archives and others who produce or organize data. In particular, a dataverse increases scholarly recognition, controls distribution of datasets, secures formal citations for data, provides legal protection and ensures long-term preservation.

UK Digital Curation Centre: Enabling Research Data Management at the Coalface—Liz Lyon, UKLON/University of Bath

The UK Digital Curation Centre (opens in new tab) (DCC) is providing advocacy, guidance and tools for research data management to the UK higher education community, as well as running a portfolio of R&D projects to understand data curation challenges at the coalface. This session looked at three DCC data exemplars: 1) data citation of complex predictive network models of disease, 2) crystallography data flows across institutional borders from laboratory to synchrotron and 3) Emerging Data Management Planning tools for institutions and faculty.

John Wilbanks, Science Commons

Scientific research has so far shown significant resistance to adopting the kinds of “generative” effects we’ve seen in networks and culture. Most of the resistance is systemic – emerging from the institutions that host research, the cultures of scientific publication and reward, the lack of infrastructures to make data and tools easy to transfer and master, and the trend towards micro-specialization of disciplines. However, some interventions from the cultural and software world can be “localized” to create an increased tendency towards generativity, and there is evidence of early success. Now it’s important to begin questioning the interventions and analyzing the potential for the “stall” that can follow a generative system’s emergence, particularly in the interim phase between the sharing of data and the deployment of the infrastructure that makes sharing as powerful as web browsing.
Baker
 
Special Topics
Computational Science Research in Latin America | slides (opens in new tab)
Session Chair: Jaime Puente, Microsoft Research

Carlos Alfredo Joly Universidade Estadual de Campinas; Ricardo Vencio, Universidade de São Paulo; Celso Von Randow, Instituto Nacional de Pesquisas Espaciais

SinBIOTA 2.0: New Challenges for a Biodiversity Information System—Carlos Alfredo Joly Universidade Estadual de Campinas

In the last three decades. many initiatives have been developed aiming to fill gaps in global knowledge about biodiversity and to facilitate access to data. The Catalogue of Life, that is becoming a comprehensive catalogue of all known species of organisms on Earth has now 1.1 million species registered, GBIF—The Global Biodiversity Information Facility, provides up till now access to 190 million species occurrence records. However, this global initiatives lack specific tools and applications to assist decision makers, and the current rate of species extinction is far from reducing as aimed by the Convention on Biological Diversity with the 2010 targets (opens in new tab). In March 1999, the State of Sao Paulo Research Foundation (FAPESP) launched the Research Program on Characterization, Conservation, Restoration, and Sustainable Use of the Biodiversity of the State of Sao Paulo, also known as BIOTA/FAPESP: The Virtual Institute of Biodiversity (opens in new tab).

SinBiota, the Environmental Information System of the BIOTA/FAPESP Program, was developed to store information generated by researchers involved with the program. In addition, the system integrates this information with a digital cartographic base, thus providing a mechanism for disseminating relevant data on Sao Paulo State’s biodiversity to the scientific community, educators, governmental agencies, and other decision and policy makers.

Between 2006 and 2008, BIOTA-FAPESP researchers made a concerted effort to synthesize data for use in public-policy-making. Scientists worked with the state secretary of the environment and nongovernmental organizations (NGOs) such as Conservation International, The Nature Conservancy, and the World Wildlife Fund. The synthesis was based on more than 151,000 records of 9405 species (table S1), as well as landscape structural parameters and biological indices from over 92,000 fragments of native vegetation. Two synthesis maps, identifying priority areas biodiversity conservation and restoration, together with other detailed data and guidelines have been adopted by São Paulo state as the legal framework for improving public policies on biodiversity conservation and restoration, such as prioritizing areas for forest restoration (as one means of reconnecting fragments of native vegetation) and selecting areas for new Conservation Units. There are four governmental decrees and 11 resolutions that quote the BIOTA-FAPESP guidelines.

In June 2009, more than 300 scientists and students associated the BIOTA/FAPESP Program or to biodiversity in general, discussed priorities and an agenda for the next ten years of the Program. As a result, a Science Plan & Strategies document was drafted, revising the goals of the original proposal. Furthermore, 10 critical points have been elected as top priorities for the next ten years, and one of them is the evolution of the SinBiota information system.

In December 2009, FAPESP approved the new Science Plan & Strategies document (opens in new tab), renewing its support to the Program up to 2020. In a project funded by Microsoft Research and FAPESP, we are developing the new Biodiversity Information System SinBiota 2.0 that will incorporate new tools and interfaces, aiming to fulfill the expectations of the research community and decision makers in the next 10 years. It might also be used as template for other regions and for the Brazilian SISBIOTA planed by the National Research Council/CNPq.

Information Technology Applied to Bioenergy Genomics—Ricardo Vencio, Universidade de São Paulo

There is no doubt that one of the greatest challenges to mankind on this century is energy production. For geopolitical, economical and, most pressing, environmental reasons, no ordinary form of energy production is the solution, making renewable and environmentally-friendly options mandatory. All major global economies are organizing themselves to tackle this issue and Brazil is no exception. Since it is recognized that biofuels may be part of the solution to such pressing problem, São Paulo State, the main biofuel producer in Brazil, launched an aggressive research program called BIOEN – FAPESP Program for Research on Bioenergy. There are several scientific and technological goals in this program related to environmental impacts, social impacts, next-generation fuels, production technology and so on.

Network of Environmental Sensors in Tropical Rainforests—Celso Von Randow, Instituto Nacional de Pesquisas Espaciais

The interaction between the Earth’s atmosphere and the terrestrial biosphere plays a fundamental role in the climate system and in biogeochemical and hydrological cycles, through the exchange of energy and mass (for example, water and carbon), between the vegetation and the atmospheric boundary layer, and the main focus of many environmental studies is to quantify this exchange over several terrestrial biomes.

Over natural surfaces like the tropical forests, factors like spatial variations in topography or in the vegetation cover can significantly affect the air flow and pose big challenges for the monitoring of the regional carbon budget of terrestrial biomes. It is hardly possible to understand the air flow and reduce the uncertainties of flux measurements in complex terrains like tropical forests without an approach that recognizes the complexity of the spatial variability of the environmental variables.

With this motivation, a partnership involving Microsoft Research, Johns Hopkins University, University of São Paulo and Instituto Nacional de Pesquisas Espaciais (INPE, the Brazilian national institute for space research) has been developing research activities to test the use of prototypes of environmental sensors (geosensors) in the Atlantic coastal and in the Amazonian rain forests in Brazil, forming sensor networks with high spatial and temporal resolution, and to develop software tools for data quality control and integration. The main premise is that the geosensors should have relatively low cost, what enables the formation of monitoring networks with a large number of sensors spatially distributed.

Envisioning a possible wide deployment of geosensors in Amazonia in the future, the team is currently working on three main components: 1) assembly and calibration of prototypes of geosensors of air temperature and humidity, with reproductive and reliable ceramic sensor elements that will adequately operate under the environmental conditions observed in the tropics; 2) development of software tools for management, quality control, visualization and integration of data collected in geosensor networks; and 3) planning of an experimental campaign, with the installation of the first tens to hundreds of sensors in an Amazonian forest site, aiming at a pilot test of the system to study the spatial variability of temperature and humidity within and above the rainforest canopy.
Lassen
1:15–4:15Design Expo | slides (opens in new tab)Kodiak
2:45–4:15Breakout Sessions 
 
Natural User Interaction
A Whole NUI World: A New Fantastic Point of View
Session Chair: Desney Tan, Microsoft Research
Scott Hudson, Carnegie Mellon University | slides (opens in new tab)

Rich new sensors are noisier than the more mature and highly focused input devices we are used to working with. Further, the recognition needed to deal with rich new classes of user actions introduces additional uncertainty. (Really good recognizers are wrong 1 in 20 times!) Technology in these areas may progress. But even if we were to somehow make these technologies nearly perfect tomorrow, human behavior is full of ambiguity. And, while we sometimes think of that ambiguity as a kind of flaw, in fact it often serves important, sometimes vital purposes in human-to-human interaction. So if you want to think seriously about “natural” interaction then you inevitably must deal with ambiguity and uncertainty. In this presentation I talk briefly about the challenges and directions for new research that this perspective implies.

Johnny Lee, Microsoft | slides (opens in new tab)

Thanks to Moore’s Law, the form factors of computing devices today are dominated by the interface hardware. The computing industry has also identified it as a major product differentiating feature. However, the notion that there will be a broad reaching new interface technology that will alter the way we interact with all computing is a counter-productive myth. As specialized and diverse devices become increasingly economical to produce, the interfaces will also become increasingly specialized and diverse. What will be intuitive and “natural” is the result of a good pairing between applications and interface capabilities and, ultimately, good design.

Michael Medlock, Microsoft | slides (opens in new tab)

The folks in this room have made or will make great things. But many of these great things will never see the light of day. Why? There are lots of reasons. I’ll give you some of the reasons that I get to see up close and personal while working on many different kinds of products in a big company like Microsoft. I focus on user experience reasons…but I touch on business and technology reasons too.

Dan Morris, Microsoft Research | slides (opens in new tab)

As computing progresses toward being smaller and more readily available in more scenarios, we pay an increasingly high price for the physical devices on which we’ve become dependent for input: buttons, touch screens, etc. We propose that the use of on-body sensors for computer input will allow us to make a critical leap toward always available computing, and in this talk I discuss some of our work in this space. Perhaps even more interesting than input modalities, though, is the implications that “always-available computing” has on the applications we can build. Consequently, I look forward to discussing—with the audience and the panel—the new application spaces we can create as we approach the long-awaited natural user interface.

Daniel Wigdor, Microsoft Research | slides (opens in new tab)

The notion of a “Natural” user interface is a well-defined design goal, which can be targeted, built towards, tested, and used to evaluate to enable iteration. In this presentation, I describe some of the misunderstandings of this goal, with a particular emphasis on the notion that it is a means, rather than a goal for design. I also describe some of the work done on the Surface team to better define it and utilize it as a tool to achieve good design.
Cascade
 
Data-Driven Software Engineering
A New Approach to Concurrency and Parallelism
Session Chair: Judith Bishop, Microsoft Research

Tom Ball, Microsoft Research | slides
(opens in new tab)
Ade Miller, Microsoft | slides (opens in new tab)

With Visual Studio 2010, Microsoft released new libraries and languages for high-level programming of multi-core systems. We looked at these extensions from the point of view of parallel patterns that can improve an application’s performance on multicore computers, as well as correctness concerns. The speakers have been incorporating their work into a book and a set of courseware, both of which will be available in the fall. See A Guide to Parallel Programming (opens in new tab) and Practical Parallel and Concurrent Programming (opens in new tab). Sebastian Burckhardt and Madan Musuvathi assisted with this session, and there was an associated booth in the DemoFest.

Rainier
 
Future Web: Intelligence, Ubiquity, and Trust
Latest Advances in Bing Maps User Experience and Ecosystem—Eyal Ofek, Microsoft; Greg Schechter, Microsoft

Online mapping represents a great opportunity to organize and present an incredible variety of information having spatial characteristics. Come see and hear about where we’re at with Bing Maps here, some of the experiences available, our approach to encouraging third-party application development, and learn more about various technologies in use in Bing Maps. And, of course, lots of demos along the way.
St. Helens
 
The Challenge of Large Data
Azure for Science Research: from Desktop to the Cloud | slides (opens in new tab)
Roger Barga, Microsoft Research; Catharine van Ingen, Microsoft Research

We live in an era in which science discovery is increasingly driven by data exploration and often occurs within a data explosion. Scientists today are envisioning diverse data analyses and computations that scale from the desktop to supercomputers, yet often have difficulty designing and constructing software architectures to accommodate the heterogeneous and often inconsistent data at scale. Moreover, scientific data and computational resource needs can vary widely over time. The needs can grow as the science collaboration broadens or as additional data are accumulated; the needs can have large transients in response to seasonal field campaigns or new instrumentation breakthroughs. Cloud computing offers a scalable, economic, on-demand model well-matched to these evolving science needs. The integration of familiar science desktop tools such as Excel or Matlab with cloud computing reduces the scientists’ conceptual barrier to scale that application in the cloud. This talk presents our experiences over the last year deploying scientific applications on Azure. We highlight AzureBlast, a deep genomics data base search and MODISAzure, a science pipeline for image processing.
Baker
 
Special Topics
The Future of Reading, Writing + Scholarship
Tara McPherson, University of Southern California | slides (opens in new tab)

In her presentation “Animating the Archive: New Modes of Scholarly Publishing,” McPherson examined how scholarly publishing is changing through the development of online multi-media journals that transcend the limitations of print publishing and offer authors more control and wider dissemination of their scholarship. She also discussed a new publishing initiative that links archives, scholars, and presses in a more seamless workflow utilizing a new lightweight platform called Scalar.

Amit Ray, Rochester Institute of Technology | slides (opens in new tab)

Wikipedia is a collaborative endeavor. No single author can solely determine content. Individuals require the consent of their peers in order to generate, edit and moderate a Wikipedia entry. These dimensions of the text are recorded in both the talk and history pages, where all changes to an entry, as well as discussion about that entry, are archived. As a result, Wikipedia provides a dynamic and continuous written record of human interactions that is historically unprecedented. Until recently, this activity has largely been limited to individual languages.

Over the last two years, cross-linguistic activity has burgeoned on Wikipedia. In this talk, I provide a brief overview of how distributed authorship on Wikipedia operates in order to address this emerging translational activity. Currently more than 270 different languages are represented, 94 of which have 10,000 articles or more. Generally speaking, many of these Wikipedia communities are organized, transparent and self-reflexive. As a result, translational activity has increased steadily, enabled by a methodically ordered clearinghouse for human-mediated translation. I analyze these structural features in order to reflect on dimensions of power as they relate to languages and translation. Do cultural flows on Wikipedia resemble those found in other dimensions of transnational culture? In what ways do they differ? To what extent do open access models such as Wikipedia facilitate and/or resist neo-liberal forms of globalization and what are the implications for not only what we read, but how?
Lassen
4:15–4:30
Break 
4:30–5:30
Closing Plenary Session
RARE: Rethinking Architectural Research and Education | slides (opens in new tab)
Chuck Thacker, Technical Fellow, Microsoft Research, Silicon Valley

By the late ’80s, the cost or chip fabrication had increased to the point that it was no longer feasible for university researchers to do architectural experimentation on real systems. Groups could no longer do the sort of experiments that led to the establishment of companies such as Sun and MIPs. Simulation replaced implementation as the experimental vehicle of choice, and papers in the field became much more incremental as researchers focused on improvements to existing techniques, rather than the exploration of new ideas at scale.

The current limits on processor performance improvement provide a strong motivation to rethink the systems that we build and study. Fortunately, the development of better design tools and methodologies, coupled with the rapid progress of field-programmable hardware, may provide a way to change the way that architectural research and education are done.

In our laboratory, we have developed Beehive, a full-system implementation of a many-core processor, as well as its memory, peripherals and a supporting tool chain for software development. Beehive is simple enough that it can be rapidly understood and modified by individuals with little hardware experience. It enables full-system experimentation at the hardware-software boundary, using inexpensive development boards and tools provided by Xilinx.

I discuss our early experiences with Beehive (opens in new tab), including experience with its use as the basis for a short course at MIT in January.
Kodiak
5:45–6:15
Board buses and travel to Kirkland 
6:30–9:00
Dinner cruise on Lake Washington 

Tuesday, July 13

TimeEvent/TopicLocation
8:00–9:00
BreakfastHood
9:00–10:15
Opening Plenary Session
Fundamental Transformations in Research | slides

Richard DeMillo, Distinguished Professor of Computing and Management, Georgia Institute of Technology; Wolfgang Gentzsch, Professor, The Deisa Project and OGF; Tony Hey, Corporate Vice President, Microsoft Research; Ed Lazowska, Bill & Melinda Gates Chair in Computer Science & Engineering, University of Washington; Rick Rashid, Senior Vice President, Microsoft Research

Today, research is being transformed more quickly than some may realize. While the focus of the research is changing, it has also become obvious to those in the research community that the way of doing research itself is undergoing rapid and profound changes. The panel discussed this phenomenon by looking at changes in the type of research areas, as well as the new varieties of methodology that are emerging and the ways that the research community is beginning to transform itself to accommodate these new developments.
Kodiak
10:15–12:45
DemoFest
McKinley
11:45–12:45LunchHood
11:45–12:45Lunchtime Sessions 
 
Project Hawaii: Resources for Teaching Mobile + Cloud Computing

Victor Bahl, Microsoft Research
Stewart Tansley, Microsoft Research | slides
Brian Zill, Microsoft Research | slides

Project Hawaii is a new effort investigating the ability of the cloud to enhance the end-user experience on mobile devices. The Networking Research Group at Microsoft Research is making available a set of cloud-enabled mobile services to better understand the systems and networking infrastructure needed to enable the next generation of applications. We are engaging universities to enable students to build projects using our platform.

St. Helens
 
Microsoft Biology Foundation | slides
Simon Mercer, Microsoft Research

The Microsoft Biology Foundation (MBF) is a library of common bioinformatics and genomics functionality built on top of the .NET Framework. Functions include parsers and writers for common bioinformatics file formats, connectors to common web services, and algorithms for assembling and aligning DNA sequences. The project is released under the OSI-compliant MS-PL open source license and is available for download. The MBF project is guided by the user community through a technical advisory board drawn from academia and commerce, with responsibility to maintain code quality and steer future development to respond to the needs of the scientific community. MBF is a community-led and community-curated project, and encourages bug fixes, feature requests, and code contributions from all members of the commercial and academic life science community.
Rainier
12:45–2:15
Breakout Sessions
 
 
Natural User Interaction
The Future of Direct Input and Interaction—Selected Perspectives
Session Co-Chairs: Ken Hinckley, Microsoft Research; Andy Wilson, Microsoft Research | slides

Patrick Baudisch, Hasso Plattner-Institute for software systems engineering GmbH, Potsdam
Saul Greenberg, University of Calgary | slides
Andy Van Dam, Brown University | slides

Direct interaction with displays is rapidly becoming one of the primary means by which people experience computing. Everyone is now familiar with multi-touch as the defining example of direct interaction, but despite rapture with the iPhone (and now iPad), multi-touch is not the whole story. Every modality, including touch, is best for something and worst for something else. Understanding direct interaction at anything beyond a superficial level requires that we answer the following question: In the holistic user experience, what is the logic of the division of labor between touch, pen, motion and postural sensing, proximal and spatial interactions beyond the display, and a diverse ecology of devices and form factors? This session delved into a number of efforts that hint at possible answers to this question, and help us understand why direct interaction is about much more than just smudging a screen with one’s stubby little fingers.

Cascade
 
Data–Driven Software Engineering
Large Scale Debugging
Session Chair: Judith Bishop

Galen Hunt, Microsoft Research | slides
Ben Liblit, University of Wisconsin | slides
Ed Nightingale, Microsoft Research | slides

The availability of the Internet has made possible not just the deployment of global-scale services for users, but also the deployment of global-scale data collection systems to aid debugging and understanding of software and hardware. In this session, we present three views on global-scale debugging of software and hardware in end-user systems. First, we present a project to turn large user communities into a massive distributed debugging army through sampled instrumentation and statistical modeling to help programmers find and fix problems that appear after deployment. Second, we describe Windows Error Reporting, a distributed system that automates the processing of error reports from an installed base of one billion Windows computers. Finally, we present the first large-scale analysis of hardware failure rates from a million consumer PCs.

Rainier
 
Operating in the Cloud
Cloud Data Center Architectures | slides
Session Chair: Dennis Gannon, Microsoft Research
Dileep Bhandarkar, Microsoft; Yousef Khalidi, Microsoft
Watt Matters in Mega Datacenters?—Dileep Bhandarkar, Microsoft

This presentation provided an overview of emerging datacenter and server challenges and share some best practices that present opportunities for our industry to improve the total cost of ownership and energy efficiency.

Energy efficiency has been a major focus of our industry and members of the Green Grid have made the use of PUE as a useful metric for data center power usage effectiveness. While we believe that PUE is a good metric and we track it in all our datacenters, there is more to achieving energy efficiency. We see rightsizing our servers and optimizing for work done per KW as a more important metric.

We look at the Total Cost of Ownership that includes server purchase price, datacenter capital costs, and application specific power consumption, PUE and management costs. Using this holistic approach that that accounts for both server and datacenter designs allows us to achieve the best overall efficiency. Performance per Watt per Dollar is a key metric that we focus our attention on.

Cloud Computing—Challenges and Opportunities—Yousef Khalidi, Microsoft

Cloud computing has the well-publicized advantages of reduced costs and increased agility. To reap these benefits, resources are typically allocated from a global large-scale computing infrastructure, highly shared among many applications and customers. This talk starts with an overview of Windows Azure, Microsoft’s cloud computing platform, as a concrete example of a large scale highly shared cloud. The talk then presents some of the challenges of cloud computing, including security, evolving applications for cloud computing, and the federation of multiple clouds. The talk explores the tension between, on one hand, the desire for reduced cost and increased agility, and on the other hand, securing applications and their data. The talk also discusses the need to evolve the application model to truly attain the advantages of cloud computing.
St. Helens
 
The Challenge of Large Data
Opportunities for Libraries

Paul Courant, University of Michigan | slides
Michael Keller, Stanford University | slides
James Mullins, Purdue University

Paul talked about a number of transformational effects of digital technologies on academic libraries, including the Google digitization project, the misalignment of intellectual property law with scholarship in the digital age, the development of large scale cooperative activities such as the HathiTrust. He also looked at policy: how can libraries and faculties (and others, including legislators) shape those transformations for better or for worse.

New Modes of Discovery and Analysis; Opportunities for Libraries in Support of Teaching, Learning, and Research—Michael Keller, Stanford University

The amazing expansion of the World Wide Web and its capabilities along with the predictable advances in information technology and network systems present now new possibilities for improved discovery of information and knowledge across the vast panoply of physical and digital carriers. Examples of these opportunities, involving not just Semantic Web and Linked Data technologies, but as well alternate means of searching and navigation were presented. The widening pool of digital objects, thanks to innovation in scholarly publishing and more recently in digitization and trade publishing, such as the Google Book Search program and the more common issuance of e-books in parallel with physical books as well as the startling rise of Web-based vanity publishing, provides new possibilities for research in methods as well as in hypotheses to be investigated. Examples of these opportunities were presented.

Opportunities for Libraries: Bringing Data to the Fore in Scholarly Communication and the Potential Implications for Promotion and Tenure | slides James Mullins, Purdue University

In most scientific and engineering fields, datasets are the lifeblood of research. Modeling demands data to run complex algorithms to test a theory or advance a methodology. Yet, datasets have not been recognized as an essential element of the scholarly communication paradigm. The published research article or the presentation of a paper at a conference is the primary means used to assess the “impact” of a scientist or engineer in his/her field. However, the underlying data, whether gathered by the individual researcher or obtained from research colleagues, make the research possible and replicable.

Can data become recognized as a major component of the research process? How can data be discoverable, retrievable, accredited, and, lastly, citable to facilitate and assess impact? Can the producer of the data be seen as a significant and major contributor to the research in a specific field? Could the dataset creator be credited and recognized at promotion and tenure for furthering research and having significant impact within the field? This presentation explored these issues along with the role of libraries in data management and the assignment of digital object identifiers (DOIs) to datasets by the recently established international consortium, DataCite.
Baker
 
Special Topics
NodeXL – Social Network Analysis in Excel

Natasa Milic Frayling, Microsoft Research | slides
Ben Shneiderman, University of Maryland | slides
Marc Smith, Connected Action

Businesses, entrepreneurs, individuals, and government agencies alike are looking to social network analysis (SNA) tools for insight into trends, connections, and fluctuations in social media. Microsoft’s NodeXL is a free, open-source SNA plug-in for use with Excel. It provides instant graphical representation of relationships of complex networked data. But it goes further than other SNA tools—NodeXL was developed by a multidisciplinary team of experts that bring together information studies, computer science, sociology, human-computer interaction, and over 20 years of visual analytic theory and information visualization into a simple tool anyone can use. This makes NodeXL of interest not only to end-users but also to researchers and students studying visual and network analytics and their application in the real world. NodeXL has the unique feature that it imports networks from Outlook email, Twitter, flickr, YouTube, WWW, and other sources, plus it offers a rich set of metrics, layouts, and clustering algorithms. This talk describes NodeXL and our efforts to start the Social Media Research Foundation.

Lassen
2:15–2:30Break 
2:30–4:00
Natural User Interaction
Natural Language Interaction Today—Selected Perspectives
Alex Acero, Microsoft Research | slides

Speech recognition has been proposed for many years as a natural way to interact with computer systems, but this has taken longer than originally expected. At first dictation was thought to be the killer speech app, and researchers were waiting for Moore’s law to get the required computational power in low-cost PCs. Then it turned out that, due to social and cognitive reasons, many users do not want to dictate to their computers even if it’s inexpensive and technically feasible. Speech interfaces that users like require not only a sufficiently accurate speech recognition engine, but also many other less well known factors. These include a scenario where speech is the preferred modality, proper end-to-end design, error correction, robust dialog, ease of authoring for non-speech experts, data-driven grammars, a positive feedback loop with users, and robustness to noise. In this talk I describe some of the work we have done in MSR on building natural user interfaces using speech technology, and I illustrate it with a few scenarios in gaming (Xbox Kinnect), speech in automotive environments (Hyundai/Kia UVO, For SYNC), etc.

Alexander I Rudnicky, Carnegie Mellon University | slides

Robots are on their way to becoming an ubiquitous part of human life as companions and workmates. Integration with human activities requires effective communication between humans and robots. Humans need to be able to explain their intentions and robots need to be able to share information about themselves and ask humans for guidance. Language-based interaction (in particular spoken language) offers significant advantages for efficient communication particularly in groups. We have been focusing on three aspects of the problem: (a) managing multi-party dialogs (defining the mechanisms that regulate an agent’s participation in a conversation); (b) effective coordination and sharing of information between humans and robots (such as mechanisms for grounding descriptions of the world in order to support a common frame of reference); (c) instruction-based learning (to support dynamic definition of new behavior patterns through spoken as well as multi-modal descriptions provided by the human). This talk describes the TeamTalk system, the framework for exploring these issues.
Cascade
 
Data-Driven Software Engineering
Dynamic Languages and the Browsers of the Future
Session chair: Wolfram Schulte, Microsoft Research

Judith Bishop, Microsoft Research | slides
Steve Lucco, Microsoft
Ben Zorn, Microsoft Research | slides

The Common Language Runtime of the .NET framework was recently extended to enable scripting languages – JavaScript, Lisp, Python, Ruby – to communicate with other infrastructures and services, including Silverlight and COM. This session introduces the DLR and goes on to consider how dynamic languages have enabled today’s Web 2.0 world. Firstly, we explore how real web applications use JavaScript and consider whether benchmarks, such as SunSpider, are representative of actual applications. This comparison is useful in understanding the best ways to design JavaScript engines further empower increasingly important web applications. Surprisingly, until recently, the behavior characteristics of JavaScript programs have not been examined. Then we look at Microsoft’s upcoming offering, IE9 which will feature a new Javascript engine called Chakra. At the latest platform preview release of IE9, Chakra was substantially faster on the Sunspider benchmark than Firefox and was nipping at the heels of Safari 5. We’ll explore some of the technical decisions that make Chakra especially effective at loading Web pages. We’ll also take a look at some details of how the Chakra engine speeds up web applications such as Office Web Word. Finally, we’ll talk about the architecture of the Chakra memory recycler, which supports fast interactive response and simple interoperation with native code.

Rainier
 
Operating in the Cloud
Programming in the Cloud | slides
Session Chair: Dennis Gannon, Microsoft Research
Jim Larus, Microsoft Research

Cloud computing provides a platform for new software applications that run across a large collection of physically separate computers and free computation from the computer in front of a user. Distributed computing is not new, but the commodification of its hardware platform—along with ubiquitous networking; powerful mobile devices; and inexpensive, embeddable, networkable computers—heralds a revolution comparable to the PC.

Software development for the cloud offers many new (and some old challenges) that are central to research in programming models, languages, and tools. The language and tools community should embrace this new world as fertile source of new challenges and opportunities to advance the state of the art.

Jimmy Lin, University of Maryland-College Park

Over the past couple of decades, the field of natural language processing (and more broadly, human language technology) has seen the emergence and later dominance of empirical techniques and data-driven research. An impediment to research progress today is the need for scalable algorithms to cope with the vast quantities of available data.

The only practical solution to large-data challenges today is to distribute computations across multiple machines. Cluster computing, however, is fraught with challenges ranging from scheduling to synchronization. Over the past few years, MapReduce has emerged as an attractive alternative to traditional programming models: it provides a simple functional abstraction that hides many system-level issues, allowing the researcher to focus on solving the problem.

In this talk, I overview “cloud computing” projects at the University of Maryland that grapple with the issue of scalability in text processing applications.
St. Helens
 
The Challenge of Large Data
Visualizing a Universe of Data | slides
Walter Alvarez and Roland Saekow, University of California, Berkeley; Curtis Wong, Microsoft Research

Our knowledge of human history comprises a truly vast data set, much of it in the form of chronological narratives written by humanist scholars and difficult to deal with in quantitative ways. The last 20 years has seen the emergence of a new discipline called Big History, invented by the Australian historian, David Christian, which aims to unify all knowledge of the past into a single field of study. Big History invites humanistic scholars and historical scientists from fields like geology, paleontology, evolutionary biology, astronomy, and cosmology to work together in developing the broadest possible view of the past. Incorporating everything we know about the past into Big History greatly increases the amount of data to be dealt with.

Big History is proving to be an excellent framework for designing undergraduate synthesis courses that attract outstanding students. A serious problem in teaching such courses is conveying the vast stretches of time from the Big Bang, 13.7 billion years ago to the present, and clarifying the wildly different time scales of cosmic history, Earth and life history, human prehistory, and human history. We present “ChronoZoom,” a computer-graphical approach to dealing with this problem of visualizing and understanding time scales, and presenting vast quantities of historical information in a useful way. ChronoZoom is a collaborative effort of the Department of Earth and Planetary Science at UC Berkeley, Microsoft Research, and originally Microsoft Live Labs.

Our first conception of ChronoZoom was that it should dramatically convey the scales of history, and the first version does in fact do that. To display the scales of history from a single day to the age of the Universe requires the ability to zoom smoothly by a factor of ~1013, and doing this with raster graphics was a remarkable achievement of the team at Live Labs. The immense zoom range also allows us to embed virtually limitless amounts of text and graphical information.

We are now in the phase of designing the next iteration of ChronoZoom in collaboration with Microsoft Research. One goal will be to have ChronoZoom be useful to students beginning or deepening their study of history. We therefore show a very preliminary version of a ChronoZoom presentation of the human history of Italy designed for students, featuring (1) a hierarchical periodization of Italian history, (2) embedded graphics, and (3) an example of an embedded technical article. This kind of presentation should make it possible for students to browse history, rather than digging it out, bit by bit.

At a different academic level, ChronoZoom should allow scholars and scientists to bring together graphically a wide range of data sets from many different disciplines, to search for connections and causal relationships. As an example of this kind of approach, from geology and paleontology, we are inspired by TimeScale Creator.

ChronoZoom, by letting us move effortlessly through this enormous wilderness of time, getting used to the differences in scale, should help to break down the time-scale barriers to communication between scholars and scientists, and to make the past at all scales comprehensible as never before.
Baker
 
Special Topics
What We Know and What You Can Do: Learning How to Turn Gender Research into Diversity Action | slides
Joanne Cohoon, University of Virginia; Carla Ellis, Duke University; Lucy Sanders, National Center for Women and Information Technology (NCWIT); Telle Whitney, Anita Borg Institute for Women and Technology

This panel brings together the three major organizations that focus on research on gender diversity across the pipeline. We present the current research and discuss results that provide actions for the academic and industrial communities to implement at their institutions. It is the expectation that attendees leave the session with activities and ideas that apply to their environment.
Lassen
4:00–5:30
Closing Plenary SessionsKodiak
4:00–5:15
The Making of Avatar: Magnificent Graphics, Multitudinous Files, Massive Storage
Richard Baneham, Animation Supervisor, Avatar; Yuri Bartoli, Virtual Art Director, Avatar; Tim Bicio, Chief Technical Officer, Avatar; Nadine Kano, Senior Director, Solution Management, Microsoft, Matt Madden, Motion Capture Supervisor, Avatar

James Cameron’s Avatar reigns as the most successful motion picture ever released, but it is also the most computing intensive movie made to date. The number of dollars earned at the box office worldwide (more than 2.5 billion) is dwarfed by the number of bytes of data generated during production: more than 1 petabyte. Sixty percent of Avatar’s 2 hours and 40 minutes were computer generated. A single frame, at 24 frames per second, could be up to 3 GB in size—and that’s just for one eye. This session features special guests from the Avatar production team who talked about the groundbreaking techniques used to create the immersive world of Pandora and its photo-real creatures and characters, and how they managed the volumes of data necessary to bring them to life.
 
5:15–5:30
Conference Conclusion and Call to Action

Tony Hey, Corporate Vice President, Microsoft Research