This post is authored by Jonathan Trull, Worldwide Executive Cybersecurity Advisor, Enterprise Cybersecurity Group. And by Vidhi Agarwal, Senior Security Program Manager, Microsoft Security Response Center (MSRC).
Within the information security community, one of the emerging areas of focus and investment is the concept of security automation and orchestration. Although the topic is not necessarily new, it has taken on increased importance due to several industry trends. Before diving into the industry trends, we should first define exactly what security automation and security orchestration mean.
Security automation – the use of information technology in place of manual processes for cyber incident response and security event management.
Security orchestration – the integration of security and information technology tools designed to streamline processes and drive security automation.
Industry trends driving the need for increased automation and orchestration
There are two primary trends driving the focus on the automation and orchestration of security event management and incident response. First, there are simply not enough skilled security professionals to support the need. A recent cybersecurity jobs report found that there will be 3.5 million unfilled cybersecurity positions by 2021.
The second industry trend driving further investments in security automation and orchestration is based on the volume, velocity, and complexity of attacks. As shown in Figure 1 below, our information environments are extremely complex and vast. They are also often beyond the capabilities of a human to perceive, visualize, calculate, and understand the interconnections. Therefore, it is difficult to accurately project risk in different scenarios. The velocity at which attacks transpire is also driving the need for automation. Based on recent examples from the Microsoft Global Incident Response and Recovery Team, we have seen situations where attackers move from an initial endpoint infection via a phishing email, to full domain control within 24 hours. Lastly, the sheer volume of cyberattacks and security events triaged daily by security operations centers continues to grow, making it nearly impossible for humans to keep pace.
Figure 1 Sources include https://nvd.nist.gov, Verizon Data Breach Report & Microsoft Incident Response Data
Security automation and orchestration at the Microsoft Cyber Defense Operations Center
Daily, the Microsoft Cyber Defense Operations Center (CDOC) receives alerts from a multitude of data collection systems and detection platforms across the 200+ cloud and online services. The key challenge they face is taking the huge volume of data on potential security events and reducing them down from thousands of high fidelity alerts, to hundreds of qualified cases that can be managed daily by the cyber defenders in the Microsoft CDOC. Automation solutions include the use of machine learning and custom software tools to handle an increasing number of events, without relying on a commensurate growth in headcount. It also accelerates Microsoft’s ability to identify those cases which need human intervention to remediate and evict adversaries fast.
Figure 2 The Cyber Defense Operations Center’s data scientists and analysts work 24×7 protecting, detecting, and responding to attacks
Microsoft Cyber Defense Operations Center workflow automation framework and engineering addresses all aspects of the job of a security responder and includes the following components:
- Automated Ingestion: With an increasing number of specialized detection platforms across host, network, identity, and service detections, CDOC has an automated ingestion process leading to a single case management system for triage and investigations.
- Stacking: Compression of alerts from thousands to hundreds of cases includes automated stacking based on time window or objects such as IP address, host name, user or subscription ID. In certain cases, alerts are aggregated or de-duplicated to reduce the noise coming to the SOC.
- Enrichment: Often defenders need to go to multiple tools and databases to get contextual information. Adding contextual metadata to alerts from systems such as asset management, configuration management, vulnerability management and logs such as application logs, DNS and network traffic logs save defenders triage time and reduces overall Mean Time to Resolve (MTTR). Furthermore, this data helps the automation system make decisions and enable appropriate actions.
- Decisions: Based on conditional logic, the automation engine determines what workflow would be invoked to initiate the desired action.
- Actions: Actions such as such as send e-mail, create a ticket, reset password, disable a VM, block an IP address, run a script to initiate processes in other tools and systems are automated.
Based on the degree of automation implemented, there is a corresponding reduction in MTTR and an ability for a defender to close more cases. The automation maturity model below highlights the automation journey for the Microsoft CDOC. Not all scenarios will need to be at Level 5. Each level accrues, achieving automation goals your organization may have.
Figure 3: The automation maturity model and automation journey, Copyright Microsoft Corporation
Measuring automation success
The goal for any security operations center automation efforts is to reduce Mean Time to Detect and Mean Time to Remediate while not having a linear growth in headcount with the growth in business. The key is to not only measure automation results and SOC efficiency, but to also gain insights to determine where automation efforts need to be spent to improve the security posture of your organization. Some fundamentals to measure include:
- Noise Reduction: Most Security Operations Centers struggle with the signal-to-noise ratio. A key measure for this is the stacking ratio that measures the compression from alerts to cases and is an indicator of reduction in triage activity needed.
- Automate High Fidelity Signals: It is critical to ensure that automation efforts are spent on high fidelity alerts and the right response processes. Measuring detection efficacy by determining true positive and false positive alerts enables a continuous feedback loop and improvement in detection signals. Understanding false negatives identifies monitoring and security response gaps.
- Address Top Offenders: It is common for security response teams to be drowned in repetitive signals and the same tasks repeatedly. Identifying and tracking top offenders over time provides insights on what needs to be further automated or prevented through better monitoring, controls and engineering solutions.
- Automation Outcomes: Validating the outcomes for automation efforts is essential to right size efforts. With increased automation teams seeing that their TTx (Time to Detect, Triage, Remediate and others) goes down and the SOC investigator efficiency increases, as the number of cases each defender can successfully resolve goes up.
Security automation and orchestration best practices
Recently, we had the opportunity to share the lessons we have learned working with our customers and from the Microsoft Cyber Defense Operations Center at RSA Asia Pacific and Japan 2017. These best practices include:
- Move as much of the work as possible to your detectors. Select and deploy sensors that automate, correlate, and interlink their findings prior to sending them to an analyst.
- Automate alert collection. The SOC analyst should have everything they need to triage and respond to an alert without performing any additional information collection, such as querying systems that may or may not be offline or collecting information from additional sources such as asset management systems or network devices.
- Automate alert prioritization. Real time analytics should be leveraged to prioritize events based on threat intelligence feeds, asset information, and attack indicators. Analysts and incident responders should be focused on the highest severity alerts.
- Automate tasks and processes. Target common, repetitive, and time-consuming administrative processes first and standardize response procedures. Once the response is standardized, automate the SOC analyst workflow to remove any human intervention where possible.
- Continuous Improvement. Monitor the key metrics we discussed earlier in this article and tune your sensors and workflows to drive incremental changes.
Microsoft is committed to our customers’ success and has applied these best practices not only internally within the CDOC but also into our Advanced Threat Protection offerings to help enterprises stay ahead of cyberattacks. In addition, our recent acquisition of Hexadite will build on the successful work already done to help commercial Windows 10 customers detect, investigate and respond to advanced attacks on their networks with Windows Defender Advanced Threat Protection (WDATP).
Microsoft’s Advanced Threat Protection offering will now include artificial intelligence-based automatic investigation and remediation capabilities, making response and remediation faster and more effective.
In addition, Azure Security Center offers advanced threat detection capabilities that utilize artificial intelligence to automate and orchestrate detection and response for a customer’s Azure workloads. This makes it easier for Azure customers to not only identify and respond to attacks against their cloud assets, but it also provides intelligent recommendations to help prevent future attacks.
Read more about the work Microsoft is doing to automate and orchestrate security workloads by learning about the capabilities within WDATP, Azure Security Center and Microsoft Security.