Protecting your Data – Sensitivity Labels, and Sensitive Information Types

October 1, 2021

Beau Faull

Hey everyone!

In my last article, we went over Information Protection and Governance at a high level and explained the difference between them (or at least how I understand them!). We also covered the four pillars of the Information Protection Lifecycle – four key activities that you would typically look to work through to achieve some information protection and governance level. Finally, we walked through the initial stage – Identifying and Classifying your sensitive data.

Over the following few articles, we will cover some of the core concepts of the second pillar – Protecting your data, and run through how we can apply some of these protective controls to a classified piece of data with Sensitivity Labels in Office 365. Information Classification is a fairly broad topic, and in this article, I will cover what sensitivity labels and sensitive information types are and how they apply to information classification. In contrast, subsequent articles will cover how to create and use these controls.

Our Data Classification Framework

So – once we have worked through the initial stage of Identifying and Classifying our sensitive data, we should have created a data classification framework. To quickly recap, this is a framework that specifies our Data Classification level, a description of that data, and some examples of the kind of data that classification maps to, such as Credit card data. As part of that process, we would have mapped the relevant controls to those data classification levels – we will use the below framework in this example.

A table showing a data classification framework example.

Keep in mind that this is an elementary example of a data classification framework; these may have sublevels referencing things such as specific projects or may have multiple different models and controls that pertain to that data.

In the above example, we can see that our data classification framework has the Data Classification Level, the description of what that classification level means, an example of that data, and finally, the controls that will apply to this data. The more confidential your information is, the more rules typically need to be used – and this could come from either regulation that applies to you or the impact that that data might have if it falls into the wrong hands.

In this instance, we want to create a label that we can apply to credit card data. Once we have applied the label, we want it to enforce encryption on that file (to limit inappropriate access) and use a watermark. When users view the document, they know what level of sensitivity has been applied to it immediately.

Sensitivity Labels

To apply these protections to a file, we use Sensitivity labels – let’s quickly run through these.

Sensitivity labels are what we configure and apply to files to classify and enforce protection. They allow these protections to be enforced without stopping users from accessing and using the file (if they should have access to it!). We can apply these labels across O365, Office applications, third-party apps and services, Power BI, on-premises file shares, and across assets in Azure Purview (I will be writing an article on this at some stage in the future).

The labels are stored in clear text in the metadata of these files and emails, allowing third-party applications and services to read them and apply their protective controls if they need to. In addition, these labels stay with the file wherever it goes, including if they are shared externally. In short, these are the Classification levels that would have been documented in your data classification framework.

Sensitive Information Types

Sensitive information types, however, are mapped to the examples column of your data classification framework. Think things like credit card numbers, bank details, passport numbers – anything that could be considered sensitive. In the Microsoft ecosystem, sensitive information types are created as pattern-based classifiers and are used to identify when this sensitive information is discovered in a file.

These extend far beyond just Information Protection and are used across the compliance stack to detect, alert, and protect where this information is identified. These will come up as a building block in nearly every article I publish on this topic from now on, so we should cover them now.

Built in Sensitive Information Types

Over 250 built-in sensitive information types are built into the Compliance Portal, ranging from more global data types, such as credit card numbers, to more regional level types, such as Australian-specific tax file numbers. Each sensitive information type consists of these fields:

Name: What the sensitive information type was called

Description: Describes what the sensitive information type is looking for

Pattern: Defines what a sensitive information type detects, consisting of the following:

Primary element – This is the main element that the sensitive information type is looking for. A primary element can be a regular expression, a keyword list, a keyword dictionary, or a function.
Supporting element – this acts as the supporting evidence to increase the confidence of a match.
Confidence Level – (High, Medium, Low) reflects how many supporting elements were detected with the primary element.
Proximity – The number of characters between the primary and supporting elements.

This can take a while to wrap your head around. I have found that an easier way to understand them is to view some of the definitions themselves – located at Sensitive information type entity definitions – Microsoft 365 Compliance | Microsoft Docs.

You can also create your sensitive information types – I usually see this where there is a particular highly confidential project or an acquisition that has been given a codeword, like Project Olivine. In this instance, you could create a sensitive information type that picks up files with codewords and apply highly restrictive controls. You can also use these to trigger and report on things such as Data Loss Prevention. If you are going to create a custom sensitive information type, make sure you test it before rolling it out to a production environment. The last thing you want is to make it too broad and apply restrictive controls across your entire estate.

Trainable Classifiers

The last thing I wanted to touch on in this subject is Trainable Classifiers. These are best suited to content that isn’t easily identified by either manual or automatic pattern matching methods. They are more about training a classifier to detect and identify an item based on the object rather than pattern matching.

A trainable classifier will learn how to detect content by looking at hundreds of examples of the content you want to classify – the more content you feed into it, the more accurate it will be.

A timeline on the deployment of a trainable classifier.

The first stage in creating a trainable classifier is to feed it large amounts of the content you want it to identify, and are all genuine, real examples of this data. Then, once the model has processed these, we provide it with a mix of matching and non-matching samples. The system will then try to determine what are positives and negatives based on the data you are trying to classify.

After this initial run has been completed, you will need to go through and confirm the results, telling the system what was correct or not until it is at a level of accuracy that you are happy with. When a trainable classifier has been published to classify the content in your estate as a standard sensitive information type would, you can also go back and ‘retrain’ the model after the initial training process to increase its accuracy. A good workflow diagram on how this whole process works can be found below:

A diagram showing the trainable classifier workflow.

What’s next?

I am starting to get into a regular cadence for these articles and hope to get one out a month unless there is a need for more – as always, feedback is welcome, and if there is anything you feel the need for me to cover, please let me know!

We covered the fundamental concepts of the Protect your Data pillar in the Information Protection and walked through what Sensitivity Labels, Sensitive Information Types, and Trainable Classifiers are and hopefully helped with the understanding of how these fit in more broadly. In my next article, we will cover how to use what we learned today to apply Sensitivity labels and their corresponding controls to files, covering both the manual method and leveraging sensitive information types to apply these automatically.

Until next time – Peace.

Categorised in: Security

This post was written by Beau Faull

Protecting your Data – Sensitivity Labels, and Sensitive Information Types

Our Data Classification Framework

Sensitivity Labels

Sensitive Information Types

Built in Sensitive Information Types

Trainable Classifiers

What’s next?

More Articles

Digital Defenders

How we manage incidents at Microsoft

Cybersecurity Recovery & Remediation After a Security Breach