Introduction

While building Lambda-based automation for patch management and security hardening, I frequently ran into a frustrating problem: some EC2 instances failed during software installation or SSM command execution. Usually the root cause was that the instance wasn’t a Managed Instance in SSM.

An EC2 instance must be managed by SSM in order to use features like Run Command, Patch Manager, or inventory collection. If it’s not managed, automation fails silently or unpredictably.

To debug this reliably, I started using the AWSSupport-TroubleshootManagedInstance runbook, which checks network paths, IAM roles, and VPC settings. In this post, I’ll show you how to launch the runbook and walk through what it checks behind the scenes - helping you fix SSM issues fast.

Step 1: Launch the SSM Troubleshooting Runbook

You can start the runbook directly from the AWS Systems Manager console:

  • Navigate to Systems Manager > Documents
  • Search for AWSSupport-TroubleshootManagedInstance
  • Choose Simple executionRate controlMulti-account and Region, or Manual execution - depending on your needs

Once launched, the runbook automatically performs several diagnostic steps to determine why the EC2 instance is not recognized as a Managed Instance by Systems Manager.

🛡️ Note: The role executing the runbook must have at least the following permissions to perform these checks successfully:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "ssm:DescribeInstanceInformation",
                "ec2:DescribeNetworkInterfaces",
                "iam:GetInstanceProfile",
                "ec2:DescribeVpcs",
                "ec2:DescribeInstances",
                "iam:ListAttachedRolePolicies",
                "ssm:GetServiceSetting",
                "ec2:DescribeVpcEndpoints",
                "ec2:DescribeNetworkAcls",
                "ec2:DescribeRouteTables",
                "ec2:DescribeSecurityGroups"
            ],
            "Resource": "*"
        }
    ]
}

This allows the runbook to query the relevant EC2, IAM, and VPC configuration needed to validate connectivity and permissions.

AWSSupport-TroubleshootManagedInstance runbook
aws ssm start-automation-execution --document-name "AWSSupport-TroubleshootManagedInstance" --parameters "InstanceId=i-07ad37999d96ee3e8,AutomationAssumeRole=arn:aws:iam::<accountID>:role/SSMTroubleshootManagedInstanceRole"

Step 2: What the Runbook Actually Does

The power of this runbook lies in the sequence of checks it performs. Below is a breakdown of each step:

✅ GetPingStatus: Check Connection to SSM

This step uses the DescribeInstanceInformation API to determine if the instance is already connected to SSM. If it is, the runbook exits early. If not, the diagnosis continues.

🔀 BranchOnIsInstanceAlreadyOnline: Conditional Branching

The runbook branches here: if the instance is already online in SSM, it skips all further steps. Otherwise, it proceeds to identify the root cause.

🔍 GetEC2InstanceProperties: Gather Instance Metadata

Collects metadata such as:

  • Subnet ID
  • VPC ID
  • Private IP
  • Security Groups
  • IAM instance profile

This data is essential for validating network and IAM setup.

🌐 CheckVpcEndpoint: Validate VPC Endpoint for SSM

Checks whether a Systems Manager VPC Endpoint (interface type) exists in the VPC. If it does:

  • Validates that security groups attached to the endpoint allow inbound TCP 443 from the instance’s private IP or SG.

🛣️ CheckRouteTable: Ensure Route to SSM

Looks at the subnet’s route table to confirm:

  • There’s a route to either the SSM VPC endpoint (preferred), or
  • A route to the public SSM endpoint (via Internet Gateway, if no VPC endpoint is used)

🔐 CheckNacl: Inspect Network ACLs

Ensures that the NACLs on the subnet allow inbound and outbound HTTPS (443) traffic to/from SSM.

🔄 CheckInstanceSecurityGroup: Outbound Rules

Validates that the instance’s security group allows outbound traffic to:

  • The SSM VPC endpoint, or
  • The public Systems Manager endpoint

Even with VPC endpoints, outbound SG rules are still required.

🔑 CheckInstanceIAM: Verify IAM Role and Account Settings

Checks whether:

  • The IAM instance profile includes AmazonSSMManagedInstanceCore or equivalent permissions.
  • The account has Default Host Management Configuration enabled (optional fallback).
AWSSupport-TroubleshootManagedInstance runbook output
1. Checks for Amazon VPC Systems Manager VPC Endpoint 'com.amazonaws.us-east-1.ssm':
- [INFO] No VPC endpoint for Systems Manager found on the EC2 instance VPC: vpc-0659a354588e79688.

2. Checks for the VPC route table entries of the instance's subnet 'subnet-02b79aec564cde270:'
- [INFO] VPC route table found: rtb-03f9fb735c4d0263b.
- [INFO] VPC local route (default route) available for 172.31.0.0/16.
- [WARNING] A local route is required to communicate with the VPC endpoint interface.
- [INFO] VPC Internet route with destination 0.0.0.0/0 found with target 'igw-04e402dfdc74364bf'.
- [WARNING] VPC internet gateway 'igw-04e402dfdc74364bf' route associated, however the instance does not have a public IP address associated. Internet connectivity through the internet gateway is unavailable.
- For more information about routing options see https://docs.aws.amazon.com/vpc/latest/userguide/route-table-options.html#route-tables-vpc-peering
- For more information about route tables see https://docs.aws.amazon.com/vpc/latest/userguide/route-table-options.html#route-tables-vpc-peering

3. Checks for NACL rules of the instance subnet 'subnet-02b79aec564cde270':
- Check network ACLs requirements instance 'i-07ad37999d96ee3e8' for instance subnet 'subnet-02b79aec564cde270':
- Check network ACLs requirements on network ACL 'acl-0f49be11fb95a1ec2':
- [OK] 'ALL' outbound traffic allowed to '172.31.16.253' from '[1024, 65535]'

4. Checks EC2 instance 'i-07ad37999d96ee3e8' security groups outbound traffic:
- Check outbound traffic to the public Systems Manager endpoint:
- [INFO] Instance security group 'sg-0ab244b3363a23c1b' allows outbound traffic on port '443' to '0.0.0.0/0'.
- [OK] Instance security group 'sg-0ab244b3363a23c1b' allows outbound traffic on port '443' to public System Manager endpoint.

5. Checks EC2 instance IAM profile and required permissions:
- Check Default Host Management Configuration:
- [INFO] Default Host Management Configuration is Default.
- Check for AWS managed policies attached to the instance profile 'EC2SSMSessionManagerRole':
- [OK] Found an AWS managed policy attached to the instance profile 'EC2SSMSessionManagerRole' with required permissions.

6. Additional Troubleshooting:
- Starting with the SSM Agent version 3.1.501.0, you can use the 'ssm-cli' tool to diagnose issues at the operating system level.
- Troubleshooting managed node availability using ssm-cli:
- https://docs.aws.amazon.com/systems-manager/latest/userguide/ssm-cli.html
- Troubleshooting reference:
- https://repost.aws/knowledge-center/systems-manager-ec2-instance-not-appear
- https://docs.aws.amazon.com/systems-manager/latest/userguide/troubleshooting-ssm-agent.html
Share this post