What happens when an organization collects user data from multiple sources, such as email addresses, phone numbers, device information, location, etc.? How do they ensure that their customer data remains secure even though they store it in various places across their enterprise?
AWS developed the Amazon Macie service to address these challenges. This solution helps organizations control access to sensitive data and prevent unauthorized disclosure, manipulation, or misuse.

Amazon Macie uses machine learning with advanced natural language processing (NLP) techniques and pattern matching, to automate the identification of sensitive data at scale on S3 buckets.

During the AWS re:invent in November 2023, Amazon Macie got upgraded with automated sensitive data discovery, across your organization. This way, you can finally identify sensitive data across all your S3 buckets that reside in your organization or account.

Enable the new functionality

You can easily activate AWS Macie from the console or via the CLI command aws macie2 enable-macie.

Afterward, you can enable automated discovery which continually selects and inspects samples of all your S3 objects. Furthermore, you can enable the option to retrieve and reveal sensitive data samples from your S3 buckets. All findings are automatically encrypted by a KMS key. Make sure that you adjust the KMS key policy for each region separately:

{
   "Sid":"Allow Macie to use the key",
   "Effect":"Allow",
   "Principal":{
      "Service":"macie.amazonaws.com"
   },
   "Action":[
      "kms:GenerateDataKey",
      "kms:Encrypt"
   ],
   "Resource":"arn:aws:kms:<region>:<accountID>:key/<keyID>",
   "Condition":{
      "StringEquals":{
         "aws:SourceAccount":"<accountID>"
      },
      "ArnLike":{
         "aws:SourceArn":[
            "arn:aws:macie2:<region>:<accountID>:export-configuration:*",
            "arn:aws:macie2:<region>:<accountID>:classification-job/*"
         ]
      }
   }
}

Both options are available from the settings page on the left-hand side of the AWS Macie console.

After AWS Macie finished analyzing, the new summary page gives you the perfect overview of all your buckets and sensitivity ratings.

AWS Macie identifies data based on the following categories:

Category
Credentials
Financial information
Personal health information
Personally identifiable information

To get a complete overview of all sensitive data types and detection types, you can visit the official AWS documentation: Using managed data identifiers in Amazon Macie

Enjoy this article? Subscribe to receive the latest news about cloud security here 📫

Identify personally identifiable information (PII)

For illustrative purposes, I have uploaded a customer.xlsx file to my S3 bucket. The excel table had the following format:

First and Last Name SSN Credit Card Number
Robert Aragon 489-36-8350 4929-3813-3266-4295
Ashley Borden 514-14-8905 5370-4638-8881-3020
Thomas Conley 690-05-5315 4916-4811-5814-8111

AWS Macie identified the sensitive data and found several social security numbers, credit card numbers, and names. If we select the finding, we can retrieve additional information like revealing the samples and diving deeper into the findings.

On a separate page, we can reveal the samples. This can take a few seconds. Afterward, we will get an overview of samples in our data set.

Final thoughts

With the introduction of the new features in AWS Macie, organizations now have a powerful tool at their disposal for streamlining the process of identifying sensitive information within their S3 buckets. In the past, ensuring comprehensive coverage and security across an entire organization's data landscape required significant effort and resources.

What's particularly exciting about this development is the potential it holds for safeguarding sensitive data. I've personally witnessed instances where Personally Identifiable Information (PII) data was mistakenly copied from a production database to development environments. Such mishaps can have serious repercussions, especially in regions like Germany, which boasts stringent data privacy laws. With the new capabilities introduced by Macie, organizations operating in such environments can breathe easier, knowing they have a robust tool to help them remain compliant and swiftly identify and address data breaches.

Moreover, the integration of Amazon EventBridge to create custom rules and remediation actions for Macie findings or to seamlessly publish these findings to AWS Security Hub adds another layer of flexibility and control to the security ecosystem.

Share this post