Powering Microsoft’s operations transformation with Microsoft Azure

|

Two Microsoft employees collaborate at a computer desk in an open working environment.
Microsoft’s operations transformation is enabling it to move from rigid, process-centered operations to an agile, customer-focused organization that runs in Microsoft Azure.

Microsoft Digital technical storiesIn any digital transformation, technology and culture changes go together, and our ongoing operations transformation here at Microsoft is no different.

As a company, we have evolved from using a process-centered, rigid, manual operations model with a disconnected customer experience. We moved to a Microsoft Azure-based model that uses modern engineering principles such as scalability, agility, and self-service that are focused on the customer experience.

Our Microsoft Digital Employee Experience (MDEE) team is leading the company on a bold, three-step strategy to build best-in-class platforms and productivity services for the mobile-first, cloud-first world. This strategy harmonizes the interests of users, developers, and IT.

To effectively deliver on the strategy, we needed to rethink our infrastructure and operations platforms, tools, engineering methods, and business processes to create a collaborative organization that can deliver cohesive and scalable solutions.

[Explore instrumenting ServiceNow with Azure Monitor. | Discover modernizing enterprise integration services using Azure. | Unpack implementing Azure cost optimization for the enterprise.]

Our operations history

Like most IT organizations, our traditional hosting services were mostly physical, on-premises environments that consisted of servers, storage, and network devices. Most of the devices were owned and maintained for specific business functions. The technologies were very diverse and needed specialized skills to design, deploy, and run.

Traditional IT technologies, processes, and teams

Server technologies included discrete servers and densely built computing racks with blade servers. Storage technologies used direct-attached storage (DAS) and storage area networks (SANs). Networks used a variety of technologies, from simple switches to more advanced load balancers, encryption, and firewall devices. Platform technologies ranged from Windows, SQL Server, BizTalk, and SharePoint farms to third-party solutions such as SAP and other information security–related tool sets. Server virtualization evolved from Hyper-V to System Center Virtual Machine Manager and System Center Orchestrator.

To provide a stable infrastructure, we needed a structured framework, such as IT Infrastructure Library/Managed Object Format (ITIL/MOF). Policies, processes, and procedures in the framework helped to enforce, control, and prevent failures. Engineering groups that used hosting services had a similar adoption process for their application and service needs, based on ITIL/MOF and combined with a synchronous data link control (SDLC)/waterfall framework.

Teams formed naturally around people with similar core strengths in the ITIL areas of service strategy, service design, service operations, and service transition, as shown in the graphic below.

Illustration of how teams naturally formed around people with similar strengths in key ITIL areas, including strategy, design, and more.
Traditional IT teams formed around the core of ITIL service areas.

Traditional hosted environments relied on external sources of space, power, connectivity, hardware, and software. And the technologies behind these sources evolved slowly. A common framework of policies and procedures helped bring teams together to refine and unify procedures. Tools were developed to formalize, track, audit, and measure procedures. The culture of the organization helped build a process-oriented, structured way of getting things done.

Challenges of traditional IT

Although ITIL/MOF helped streamline some processes, the complexities, constraints, and dependencies of traditional hosting prevented agile engineering. For example, it usually took six to nine months to build a new development environment for an application or service team. This time included planning, coordinating resources, tracking issues, and mitigating risk. Although the structure added clarity in delivery, it removed business agility.

Long-term managed services offered opportunities to build cost efficiency. But, because of the way processes were implemented, functional roles were often duplicated. This created an overall negative impact on time and cost.

When our engineering teams used SDLC waterfall methods and operations teams used ITIL/MOF, adhering to process took priority over delivering iterative, agile solutions to meet targeted business needs. These processes slowed business throughput significantly. Solutions were developed and deployed over years instead of months.

Phase 1: Improving operational efficiency

Our MDEE team plays a pivotal role in the company’s new strategy, as most business processes in the company depend on us. To help Microsoft transform, we identified key focus areas to improve in the first phase of our transformation: improving business agility, reducing costs, learning new skills, and inventing new ways to work.

The graphic below shows the steps we took to get to Microsoft Azure.

Illustration outlining key areas the MSEE team identified to help Microsoft transform its strategy and move to Microsoft Azure.
We moved toward our IT mission by transforming technology and customer service.

Infrastructure Platform. An agile business demands agile infrastructure, fewer physical servers, and moving to/innovating in Microsoft Azure.

Strategy. Migrating to the cloud highlighted the need for build, change, and policy management processes as self-service capabilities. Our approach is to use software to automate provisioning, management, and coordination of services, so our Microsoft business partners can develop and deploy services faster with less work and lower cost.

Structure. We had to rethink the way that our teams and roles delivered this strategy by integrating different teams that did similar tasks. This allowed us to effectively design and deliver end-to-end service offerings at lower cost. Our organization was restructured to form teams that optimize service and infrastructure. These teams learn new skills, work harmoniously with engineering, and reduce waste.

Culture. We embraced a growth mindset, learned new skills, built new capabilities, and found new ways to work.

Mission. It became our mission to define, deliver, and transform how we work by helping engineers build solutions tailored to the hybrid cloud world.

Realigning our organization

Services optimization. This team helps our business partners to provision and manage their own IT services. We have improved operational agility and reliability, which has resulted in specific benefits:

  • Less manual effort per release/update
  • Shorter lead time
  • More frequent builds and deployment
  • Increased service quality
  • Reduced security exposure

We elevated our teams by training people and hiring others with the engineering skills we need. Our goal is to gradually transition people from operational skills to service engineering skills.

A deeper analysis of our operational model also revealed redundant processes in service design, service transition, and service operations. After careful consideration, we reduced process overhead by eliminating or automating some processes. This restructuring presents a business opportunity to consolidate vendor teams. Many of our sustained workloads will decrease year over year, as on-premises infrastructure shrinks.

Infrastructure Optimization. This team eliminates duplicate infrastructure, reduces our footprint, and modernizes infrastructure for our business partners by reducing hosting costs. Key outcomes of this work include:

  • Consolidated datacenters
  • Fewer physical and traditional virtual machines
  • Smaller storage consumption
  • Increased cloud adoption

When teams started working together to optimize infrastructure, they found duplicate projects with similar goals. After we cut redundant projects, people were freed up to learn project management skills and to engage with our business partners.

This team took a program-based delivery approach with start and end dates. After provisioning was automated, we worked with our business partners so they could use new self-service tools to take ownership of their infrastructure. The new self-service features helped our business partners identify and decommission unused servers. Self-service planning eliminates manual handoffs, and enables our business partners to manage risks, issues, and blockers. Our business partners also found that they no longer needed vendors to manage hand-offs.

Reinventing our culture

To reinvent ourselves, we needed to change. We stopped managing processes and began trusting our business partners and empowering engineers. We defined our new mindset and goals to:

  • Focus on the customer by designing and building new services from their perspective.
  • Challenge and question the status quo, and rethink old processes and behaviors.
  • Experiment and learn so we can produce innovative cloud technologies using agile methods.
  • Collaborate beyond our organizational boundaries to identify and deliver the right solution for our business partners.
  • Deliver faster and fix issues faster.

The business outcome

Combined, all the changes we made produced tangible results. We improved our agility and enabled our Microsoft business partners to deploy services faster with less work at a reduced cost. We were able to:

  • Reduce manual work by about 60 percent.
  • Migrate 10 percent of the IT ecosystem to the public cloud (Azure IaaS).
  • Decommission on-premises data centers across the pre-production ecosystem.
  • Optimize about 42 percent of our global workforce.
  • Save about $6.5 million in organization operational costs.

Lessons learned in Phase 1

Through this process of technological and cultural evolution, we learned that:

  • Next-generation, modern applications will come from innovating in Microsoft Azure. A private cloud cannot provide the innovations and scale that Azure can.
  • There are a multitude of technical requirements to help our Microsoft business partners migrate to Microsoft Azure.
  • Tools that support the private cloud don’t scale for Microsoft Azure, which significantly impacts agility.
  • Processes established for a private cloud cause a fragmented and disconnected experience in Microsoft Azure.
  • Capability gaps to connect Microsoft Azure inventory, utilization, and cost led to drastic increase in Azure operational cost.

Phase 2: Delivering value through innovation

To effectively harness the benefits of Microsoft Azure, we migrated 90 percent of our IT infrastructure to Azure and then balanced the business need for innovation with efficient operation. We decided to use native cloud solutions, phase out customized IT tool sets, and decentralize and simplify operations processes as we adopt the DevOps model.

Changing roles

Microsoft Azure DevOps is a work model that integrates software developers and IT operations. As we move to the cloud, IT infrastructure support is drastically reduced. Going forward, we offer the most value to our business partners by adopting Infrastructure as Code to achieve friction-free interaction with engineering teams and support continuous deployment. We redefined operations roles and retrained people from traditional IT roles to be business relationship managers, engineering program managers, service engineers, and software engineers:

  • Business relationship managers engage with our Microsoft business partners to understand their needs and to tailor Microsoft Azure capabilities for their business needs. Business relationship managers listen, prioritize, and manage expectations across business, infrastructure, and Azure teams.
  • Engineering program managers design and deliver solutions in partnership with software engineers, service engineers, and business relationship managers.
  • Software and service engineers focus on developing reliable, scalable, and high-quality automated services, which eliminates much manual work. As we retrained people from operational to engineering and relational skills, we saw a gradual uptick in engagement with our business partners.

Simplifying operational processes

In the past, the processes that Microsoft used to manage corporate inventory, procurement, software development, security management, financial management—and other functions—were disconnected from each other and confined within organization boundaries. And existing processes and tools resulted in long wait times for simple IT tasks.

A simple application infrastructure took at least 40 days to provision, and complex applications with multiple dependencies could take over a year. The traditional IT mindset, processes, and obsolete tools had a negative impact on software engineering productivity. IT operations processes were realigned as shown in the graphic below.

Graphic outlining how IT operations processes were realigned to improve the timeline for both simple and complex apps with dependencies.
IT operations support for different stages of the development/deployment life cycle were realigned for Microsoft Azure.

Microsoft Azure radically simplified our IT operations. Simple projects can be provisioned in Azure within one day, and complex projects can be provisioned in six days. We increased our speed 40-fold by eliminating, streamlining, and connecting processes, and by aligning processes for Azure.

Adopting native cloud solutions

We are retiring many customized IT tools and focusing on native cloud solutions using Microsoft Azure Infrastructure as Code within the Microsoft Azure Resource Manager (ARM) fabric. By using ARM templates, APIs, and PowerShell (as well as integrating developer tools) we can rapidly provision a hosting platform.

We also adopted software-defined networking (SDN) by developing APIs to dynamically procure Microsoft Azure ExpressRoute load balancing and traffic managing capabilities, which connect, secure, and route traffic and improve application responsiveness. Microsoft Azure Site Recovery (ASR) is primarily used for lift-and-shift migration of virtual machines.

Microsoft Azure Operations Management Suite (OMS) is a Software as a Service (SaaS)-based, cross-platform solution with capabilities that span analytics, automation, configuration, security, backup, and disaster recovery. OMS is designed for speed, flexibility, and simplicity and effectively manages windows servers and Linux in a hybrid cloud environment.

The graphic below shows how native cloud solutions allow many traditional IT processes to become self-service.

Graphic showing how native cloud solutions allow many traditional IT processes to become self-service processes.
Traditional IT tasks and processes are now self-service native cloud solutions.

ICM is the Incident Management System for Microsoft. With high-availability cloud support, and cloud‑based access, we now support Microsoft Azure and many other services across Microsoft.

Cloud Cruiser, a third-party SaaS application, gives us valuable financial information and reports about our Microsoft Azure usage and spending in near-real time.

Using Cloud Cruiser, we can examine and aggregate financial data across multiple global Microsoft Azure subscriptions, which is crucial. Our Azure environment contains many subscriptions—Cloud Cruiser gives us the immediate visibility that’s required to manage and control costs.

Microsoft Azure Advisor is a personalized cloud consultant that helps us follow best practices to optimize our Microsoft Azure deployments. It analyzes our resource configuration and usage telemetry. It then recommends solutions to help improve the performance, security, and high availability of our resources while looking for opportunities to reduce our overall Azure costs.

Optimizing Microsoft Azure

With much of our cloud infrastructure in place, we recognized the need to optimize our Microsoft Azure resources. We created Microsoft Azure Resource Optimization (ARO), a combination of tools, processes, and education to help Microsoft teams examine both their total cost of cloud resources and the number of underutilized assets. The types of underutilized resources are evaluated to identify cost savings opportunities, such as IaaS virtual machines, Azure SQL databases, PaaS web and worker roles, Azure storage, virtual networks, and IPs.

Some examples of ARO recommendations include adjusting SKU sizes, deleting unused resources, or turning off resources during downtime. The overall ARO goal is to increase awareness of consumption, optimization, and cost of Microsoft Azure resources across Microsoft, to encourage engineers, managers, and leadership to adopt cost-effective behaviors. We deliver business intelligence to help people make key decisions about Azure usage, which will promote a culture of cloud optimization.

Modern teams

To implement our cloud-first transformation effectively and quickly, we formed engagement and program management teams to connect with our internal business partners, identify their needs, prioritize features, and deliver them with focused discipline. Individuals who can code Microsoft Azure infrastructure solutions as APIs, PowerShell scripts, and templates were united as software engineering teams. And we grouped all the manageability services under service engineering teams to provide reliable, available, and supportable services.

All other IT operations support teams were decentralized and integrated into application teams using the Microsoft Azure DevOps model to improve issue resolution time. Employees learned new skills, and we hired new people with needed skills. Assessing, refining, and hiring the right talent is part of organization hygiene.

Business outcomes

Accelerating our transformation to Microsoft Azure by changing roles, investing in new skills, and simplifying operations processes had four important benefits.

More productive workforce

  • IT ecosystem is 98 percent in Microsoft Azure (IaaS mostly).
  • We shifted to a self-service culture.
  • Microsoft Azure DevOps is in practice.

More agile business

  • Provisioning speed was increased 40-fold by simplifying operations processes and using native cloud solutions.

Reduced costs

  • Customized IT tools were reduced 60 percent.
  • CPU utilization increased 400 percent.
  • Annual cloud spending was reduced 38 percent.
  • On-premises IT datacenters and labs have been decommissioned across our production ecosystem.

Improved business partner experience

  • We have improved the user experience and engagement with our business partners. We have shared practices and lessons learned across our company and industry.

Lessons learned in Phase 2

To make our digital transformation to Azure a success, we had to:

  • Redesign strategic assets as Platform as a Service (PaaS) solutions.
  • Integrate engineering and manageability platforms.
  • Use data as a strategic asset.
  • Use predictive analytics and machine learning to prevent and remediate failures.

Phase 3: Embracing the digital ecosystem

Our ability to take advantage of emerging technologies and to embrace new business strategies will be a deciding factor in the modern era. Going forward, our MDEE teams are organized around end-to-end ownership of services that delight our business partners and that focus on innovation, co-creation, and collaboration.

Our first phase of transformation focused on migrating infrastructure and automating processes to drive efficiency and lower operations costs. The second phase was driven by adopting the Microsoft Azure platform, simplifying operations processes, and changing operations roles to invest in engineering, customer service, and native cloud solutions.

The next stage includes developing intelligent systems on Microsoft Azure to deliver reliable, scalable services and to connect operations processes across Microsoft. Bots will support basic user queries, while service reliability engineers strive to predict and remediate failures using predictive analytics and machine learning. Our focus is on operational resilience and cost avoidance. Several industry trends drive the continued evolution of our digital IT ecosystem:

  • DevOps culture accelerates engineering team deliverables and decisions using a boundary-free flow of information and frictionless processes.
  • Native cloud solutions offer an enterprise-level manageability platform that supports decentralized services and enables flexible, predictable, reliable response to changes with speed.
  • Data has become a durable asset. With the proliferation of cloud infrastructure, mobile applications, and IOT devices there are growing needs to store massive data and analyze it in near-real time to predict patterns, build models, and drive intelligent actions among end-user communities
  • Open source standards are increasingly supporting a platform for innovation, moving to the cloud, and enabling community governance at scale to balance the need for security with agility
  • MDEE as a services broker shifts our engineering focus from system design/build to assembly, configuration, and integration of specialized third-party software components. We can accelerate the time to value and reduce technical debt.

The graphic below shows how our digital transformation and move to the cloud will use automation, enhanced resiliency, predictive analytics, and bots to integrate business partner feedback and improve service to our business partners.

Illustration showing how our digital transformation and move to the cloud uses automation, enhanced resiliency, predictive analytics, and more.
A system of applications and platforms, combined with predictive analytics, is at the heart of our digital transformation.

We recognized that our business partners need hybrid cloud scale and economics by offering enterprise-level engineering and management platforms. We have embraced the industry trends of mobility, IOT, machine learning, AI, open source, and cross-platform standards.

Together, Microsoft Azure PaaS, Visual Studio Online, and AppInsights will enable engineers to focus on features and usability, while ARM fabric and OMS will provide a single pane of glass view to provision, manage, and decommission infrastructure resources securely. Only through optimizing the engineering and manageability process independently and in concert with each other can we achieve the digital transformation goals for Microsoft.

Key Takeaways
Our MDEE team plays an influential role in the digital transformation of the company. Our evolution and move to Microsoft Azure is anchored around the idea of building connected intelligence systems to transform how we engage with business partners, empower engineers, optimize operations, and reinvent products. Delivering excellence will drive the cultural change to modern practices.

With connected systems, simplified self-service provisioning, and a focus on our business partners, we can scale our infrastructure service offerings across the company and drive innovation, business agility, and productivity. In the process, we will also reduce costs and improve our operations resilience.

Related links

Recent