Microsoft transforms cash flow data into an intelligent analytics platform

Two female developers collaborating in their enterprise office space.

By connecting cash flow data to a broader financial data ecosystem, Microsoft Digital has created a cash flow analytics platform capable of far greater insight than cash flow statements historically yield. Learn how the platform, powered by Microsoft Azure, elevates the role of finance professionals by providing, actionable insights into cash flow, daily—and how machine learning will propel cash flow analytics into the future.

Through a platform built primarily on Microsoft Azure, Microsoft has transformed the role of cash flow analytics within the company. It has also transformed the role of Microsoft’s finance professionals.

Building on the Microsoft Enterprise Data Strategy, which centralizes previously siloed data and makes it accessible via data lakes, the new cash flow analytics platform gives finance professionals the ability to dynamically interact with cash flow data. The platform infuses cash flow analytics with deep intelligence, creating a tool from which Microsoft can extract unprecedented levels of granular insight into cash flow. Finance professionals can then transform those insights into actionable takeaways that directly impact business outcomes.

In this way, the cash flow analytics platform not only reports on historical data, but also plays a crucial role in optimizing and predicting the financial health of product segments, departments, and the entire Microsoft organization.

This paper examines the following topics:

  • The motivation behind the move to the modern cash flow analytics platform
  • The process by which the Foundational Finance Services team—part of the Microsoft Digital organization—built the platform
  • The technologies the Foundational Finance Services team used to achieve their vision
  • The wider implications of the project

Elevating the role of finance professionals

The intelligence derived from the cash flow analytics platform is a result of both enriching existing data and combining that data in novel ways. The end results add a dimension of understanding beyond the scope of the humble cash flow statement. Cash flow statements typically answer questions like what did the department purchase. The cash flow analytics platform can answer questions like who paid an invoice, when the payment occurred, and what payment type was used.

Enriching existing data provides a more holistic understanding of payments. For example, the cash flow analytics platform supplements typical cash flow data with the geographic region where a payment originated, the team responsible for the payment, and the segments—such as cloud or consumer products—that the payment belongs in. In addition to supplementing cash flow data with this type of product, geography, organization, and function-level data, the new analytics platform enables finance professionals to deepen their understanding by overlaying procurement data onto cash flow data. Cash flow speaks to what was purchased, procurement data speaks to why it was purchased, and how it was procured.

Together, these additional dimensions of visibility into cash flow give finance professionals deep, immediate insight into their domain. When reporting to leadership, they can now answer questions that were previously unanswerable, or that would take weeks of research to answer. As a result, finance conversations and meetings are immeasurably more productive and engaging.

For example, if a finance professional is interested in zeroing in on capital expenditures, they can use dashboards to drill down to capital expenditures for the entire company in a few clicks. They can see cash inflow and outflow by business, making it easy to spot trends that were previously buried beneath static data. That level of insight gives finance professionals a means to provide precise recommendations in order to deliver a particular business outcome.

In building the platform, the Foundational Finance Services team accomplished three things:

  • Automating cash flow analytics, which gives finance professionals a daily snapshot of financial health, rather than a static quarterly cash flow statement.
  • Connecting the platform to new, trusted data sources to add additional dimensions of insight to cash flow analytics, and making that data available for finance professionals to interact with in Microsoft Power BI.
  • Integrating machine learning capabilities to capitalize on the massive amounts of data being ingested, which will be used to generate more accurate cash flow forecasts.

Because this project depends on vast troves of accurate, real-time data both to generate daily financial health snapshots and to generate forecasts, it uses the direct method of cash flow analytics. The direct method states cash in and out directly, making it ideal for real-time analytics and machine learning purposes. The team did not use the indirect method. The indirect method uses net income as a base, which then requires several conversions to arrive at a cash amount.

Building on the Azure modern data warehouse blueprint

Begun in Q2 2019, the immediate goal of the project was to provide attribute-rich data. Data that would enable real-time data-driven cash flow analytics based on the direct method. Accomplishing that goal was technically possible two or more years ago, but it would have required many more resources and much more time. Advances in Microsoft Azure—particularly in Databricks and Automated Machine Learning (AutoML)—meant the Foundational Finance Services team could leverage Microsoft Azure’s recommended architecture to implement an automated solution and deploy custom machine learning models at scale in considerably less time. Less than three quarters after the team conceptualized the platform, it was production-ready.

That recommended architecture includes:

  • Microsoft Azure Data Lake Storage for raw data storage
  • Microsoft Azure Data Factory to create, schedule, and orchestrate workflows
  • Microsoft Azure Databricks to prep and train machine learning models
  • Microsoft Azure AutoML for forecasting capabilities
  • Microsoft Azure Analysis Services to store and scale data in the cloud
  • Microsoft Power BI for end users to consume packaged data

Because the team didn’t have to dedicate early project phases to developing a scalable architecture, they had the luxury of devoting those resources to data analysis and acquisition. They also had time to familiarize themselves with the business context of the ingested data—an exercise that, in hindsight, was significant enough to become a best practice.

Figure 1 illustrates the components that ingest, store, prepare, train, and model the data before it is output to Microsoft Power BI or Microsoft Excel.

Graph showing the cash flow platform ingests data from existing sources, stores in data lakes, and sends it to Microsoft Power BI and Microsoft Excel for reporting.
Figure 1. The cash flow platform ingests data from existing sources, houses the data in data lakes, and sends it on to Microsoft Power BI and Microsoft Excel for reporting, while also continually refining its machine learning models.

As illustrated in Figure 1, the platform receives input from three sources:

  • DataMall: Master data, such as company name and exchange rate
  • Treasury System: Transactional data, such as cash transactions and transfers (including, but not limited to, data already housed in SAP)
  • SAP: All transactional data

This data is housed in the finance data lake, a company-wide data store, and the cash flow data lake. The utility of the finance data lake spans many other teams, departments, and projects. The cash flow data lake was built by the Foundational Finance Services team for this project. In addition to the data in the data lakes, the Microsoft Digital and Engineering Fundamentals teams made several datasets available to the Foundational Finance Services team.

Raw data from the DataMall, Treasury Systems, and SAP is routed to the cash flow data lake. Microsoft Azure Databricks then enhances the data by adding, for example, relevant business logic and time series models used to forecast quarterly results. Meanwhile, AutoML uses the data to refine the machine learning models used to produce those forecasts. The enriched data is then sent back to the cash flow data lake, which pushes the data to Microsoft Azure Analysis Services (AAS). AAS holds the pre-processed data in memory, so it’s readily available for Microsoft Power BI to ingest and present as a dashboard. From there, end users can manipulate the data via filters to obtain a more granular view, create Microsoft Power BI reports, or export the data.

Other components include:

  • Databricks Notebook to add custom code into Microsoft Azure Databricks
  • Log Analytics to write log queries in Microsoft Azure Monitor and analyze the results
  • Application Insights (a Microsoft Azure Monitor feature) to monitor applications
  • Microsoft Azure Active Directory for authentication
  • Microsoft Azure Key Vault to store credentials
  • IcM (a ticket management system) to notify data owners when incoming data doesn’t comply with the minimal data quality threshold

Predicting quarterly cash flow with machine learning

During this process, Microsoft Azure AutoML provides machine learning capabilities so the Foundational Finance Services team can not only automate cash flow analytics and extract immediate insights, but also generate quarterly predictions. As Databricks processes the data, Microsoft Azure AutoML adds historical and training data, enriching the dataset before sending it back to the cash flow data lake to be consumed by AAS. Each day, the system processes all data from the current quarter, including data from the previous 24 hours, and generates an updated daily forecast.

Implementation

The team conceptualized the project in Q2 2018, and implemented a pilot program in Q3 2018. The project had three phases:

  • Descriptive analytics
  • Diagnostic analytics
  • Predictive analytics

Phase 1: Creating a cash flow intelligence platform through descriptive analytics

In the first phase, the team focused on identifying and connecting the various data sources that would serve as inputs. The finance data lake was the result of an organization-wide effort at Microsoft to move disparate, siloed data to data lakes. These data lakes make data easy to access and use for machine learning purposes, which can in turn be used to build smarter experiences for Microsoft customers and for our own employees and partners.

While most SAP data is available in the finance data lake, cash flow-specific data, such as exchange rates from the Treasury System and master data from DataMall, is housed in the cash flow data lake. The cash flow data lake was built for this purpose. Because of the work done as part of the data lake initiative, most major datasets the Foundational Finance Services team needed for this project were already housed in the finance data lake, and could be processed within it. Thus, data processing was more efficient.

Arriving at the descriptive analytics platform that would be the foundation of this project would have been theoretically possible without centralizing data in data lakes, but it would have been unrealistic. It would have required a small army of financial controllers to glue massive amounts of spreadsheets together. Because of the organization-wide data lake strategy, however, Microsoft had already laid that foundation. Foundational Finance Services connected the finance data lake and the cash flow data lake to create their descriptive analytics platform.

Phase 2: Developing insights to improve cash management through diagnostic analytics

In the second phase of the project, the Foundational Finance Services team focused on diagnostic analytics that would isolate inefficiencies, uncover trends, and provide actionable insights to financial controllers. In collaboration with stakeholders who would be the primary beneficiaries of cash flow analytics automation, the team determined during the pilot phase that efficiency gains would yield as much as $25 million in immediate cost savings.

The team built a platform that could ingest raw data and model it in a format that Microsoft Power BI and Microsoft Excel could consume. By March 2019, they had automated portions of cash flow statements but could only generate them quarterly. Six months after starting the project, the team was generating daily analytics, and controllers were extracting daily insights.

Prior to these efforts, non-standard supplier payments (those that did not comply with the terms of the policy) comprised 27 percent of all spend in that space. Once the cash flow analytics platform began standardizing and automatically flagging non-standard payments, those payments comprised only 10 percent of spend. This 3x reduction in non-standard payments meant financial controllers examining those payments could tighten their focus, resulting in deeper analysis.

Both the descriptive analytics platform (Phase 1) and the diagnostic analytics platform (Phase 2) are in production.

Combination column chart depicting standard spend, non-standard spend, and as a percentage of FY2020Q2 and FY2020Q3.
Figure 2. Standardizing supplier payments through the cash flow platform reduced non-standard payments from 27 percent of all payments to 10 percent.

Phase 3: Measuring and improving cash flow forecast accuracy through predictive analytics

When the team was satisfied with the volume and quality of the data being ingested and the feedback from financial controllers was sufficiently promising, they turned their attention to phase 3 (predictive analytics). This phase consists of six distinct processes.

Data ingestion: Using Microsoft Azure Databricks job scheduling and parallel processing, all structured and semi-structured data is pushed to the cash flow data lake.

Data curation: A quality threshold is applied and de-duplicate checks are performed. If incoming data doesn’t meet the stated quality threshold, the data isn’t merged.

Data processing: All business logic, filters, joins, and aggregations are performed using Pyspark in Microsoft Azure Databricks. The final dataset is output as a .TSV file for consumption by AAS.

Data modeling: Based on the facts and dimensions created, the team built a star schema model in AAS. The tabular model allows for efficient slicing and dicing of data. The in-memory caching layer results in high performance and a satisfying end user experience in Microsoft Power BI and Microsoft Excel.

Machine learning: Using Microsoft Azure AutoML, Python, and Microsoft Azure Machine Learning, the platform derives insights from the data provided to generate a quarterly forecast of supplier spend (broken down by payment category), and predicts invoice clearing dates.

Quality control and reporting: LiveSite dashboards provide robust telemetry of end-to-end data, from provider to reporting.

By summer 2019, the team had built an end-to-end view of cash flow. End users could view this cash flow data as a dashboard in Microsoft Power BI and drill into 13 payment categories. To date, the cash flow forecasts are not yet in production, though they are production-ready.

Graph showing cash flow forecast accuracy has hovered at +/- two percent in early testing.
Figure 3. Cash flow forecast accuracy has hovered at +/- two percent in early testing.

Benefits

Though the project will eventually expand to encompass a broader portion of cash flow services at Microsoft, the Foundational Finance Services team focused their initial efforts on cash outflows like payments to suppliers. Each Microsoft supplier enters into a mutually agreed upon set of terms when they do business with Microsoft. Those terms are stored in SAP, Microsoft’s enterprise resource planning system of record, and include payment terms such as frequency of payments made to the supplier. However, because cash flow analytics were generated quarterly, and because generating them was such a laborious process, the details of the supplier terms were rarely factored in when making payments.

By automating the cash flow analytics and making fresh data available daily rather than quarterly, such considerations can be baked into the payments process. The data available in SAP becomes one of the many signals used to schedule payments in a more structured and efficient way. That gives finance professionals more control over day-to-day operations, which in turn gives them a greater role to play in impacting team or departmental goals. Acquiring supplier payment data required manually gathering it from as many as 23 individuals. Automation, of course, eliminates that inefficiency.

In total, the project resulted in $25 million worth of efficiency gains. The team has also identified many other opportunities. Early research suggests that policy and process updates will save Microsoft an estimated $130 million per year.

More importantly, financial professionals have gained daily, actionable insights they can use to directly impact business outcomes. The insights provided are still being monitored for accuracy, and are in beta status. Early results are promising, with an accuracy range of +/- two percent.

Best practices

Though the project hit relatively few bumps in the road, it did face challenges typical of projects that seek to marry raw data with its business context, as well as obstacles common to any machine learning project. Keeping these challenges in mind at the outset (and planning accordingly) can mitigate their impact.

Devote sufficient resources to understanding the data’s business context. The engineers working with the data, especially in the early phases of the project, are not the original owners. Thus, they have little familiarity with its context. Lack of context, however, can have a dramatic impact on the accuracy of insights and machine learning models. The Microsoft Digital and Engineering Fundamental teams made several datasets available to the Foundational Finance Services team. It makes sense to involve data owners early in the project, and to make the project a true collaboration with non-technical stakeholders.

Within this project, and more broadly at Microsoft, the team encouraged the original owners of published data (data housed in widely accessible data lakes) to provide more context, including source data, where applicable, before publishing.

For the Foundational Finance Services team, this project has also meant working closely with SAP architects and line of business users. Those partnerships helped engineers gain more familiarity not only with the dataset being explored or examined, but also with the ecosystem in which the data exists. Engineers also learned how the data relates to other datasets in the ecosystem, even tangentially.

Supplier payments are a single dataset, for example, but that single dataset has 13 owners. Verifying the accuracy of that data with all 13 owners has proven critical to accurate insights and forecasting.

Develop a code testing plan. Early in the process, Foundational Finance Services used Microsoft Azure Databricks Notebooks to build out data processing pipelines with business logic, rather than using them exclusively to perform data analysis. That made it possible to deliver business value as quickly as possible with continuous integration/continuous delivery, but it made testing challenging. Eventually, the team developed a Python application to test code, migrating to a hybrid version of Microsoft Azure Databricks Notebooks and the Python application. Once the team moved the code into the Python application, they were able to execute unit and regression tests.

Though it took some time to build a Python application late in the project, writing everything in Microsoft Azure Databricks Notebooks sooner rather than later made sense in the early stages, since it allowed the team the freedom to deploy rapidly and often. Once they were satisfied with the business value they’d delivered, devoting resources to building the Python application was an easier sell.

Follow the Agile methodology. Had Foundational Finance Services not followed the Agile methodology, the platform would’ve taken considerably longer to build. For example, to find out which supplier payment reports are needed, the team would have to wait for a complete analysis. Instead, they tracked the necessary data to its source and connected to it. Having direct access to the data, the team could analyze it, then work with the owners of the data to verify its accuracy. If they found a discrepancy, they could work with data publishers to make the necessary corrections before investing time in building reports for end users.

Prioritize data sources using cash flow analytics. While tracking down data sources early in the project, the engineering team studied existing cash flow analytics. They ranked data sources in reverse order based on the volume of transactions in the cash flow analytics, allowing them to easily prioritize the early work of seeking out data sources.

What’s next

The insights uncovered in relation to this project—totaling $130 million in efficiency gains—are being translated into policy and process updates. The Foundational Finance Services team is currently working with line of business partners to operationalize those changes.

In the meantime, the forecasting model is being monitored for accuracy and continually refined. While insights uncovered through diagnostic analytics have provided the largest returns to date, the team expects predictive forecasts to be equally impactful. Daily projections are being compared to daily cash flow actuals to gauge the accuracy of the forecasts. In the most recent observations, the margin of error has been reduced to approximately one percent when projecting across all payment categories. When filtering for a particular payment category, however, the margin of error increases because the model has not yet been trained to account for seasonality. That seasonality work is ongoing.

Conclusion

The wider data lake strategy at Microsoft yields immediate benefits and lays the groundwork for future innovation. The cash flow analytics platform exemplifies the approach and the possibilities of that strategy. Making financial data widely available and accessible enabled the Foundational Finance Services team to create a dynamic, multi-dimensional, daily snapshot of cash flow, accessible via interactive dashboards. The result is that financial controllers at Microsoft have unprecedented control over the financial health of the teams and departments they oversee, and have become strategic partners in the quest for operational efficiency.

Recent