11 March 2022

Auto Remediation with Eventbridge, Step Functions, and the AWS SDK Integration

Learn how to use the AWS SDK for Step Functions to auto-remediate findings.

Dustin Whited
Dustin Whited Director, Security Engineering LinkedIn

In late 2021, AWS released a new feature for Step Functions (SFN) allowing AWS SDK calls directly from the workflow. Previously, a Lambda function or SSM document would be invoked by the SFN State Machine to make AWS SDK calls. This new feature allows low to no code workflows within AWS.

This is important because unmaintained code may introduce vulnerabilities into the environment and create tech debt. Ultimately, the responsibility for keeping code up to date in a cloud environment lies squarely with the customer. This new Step Functions SDK integration reduces the operational burden on security teams by transferring that responsibility to the Cloud Service Provider. This blog will explore an example of functionless auto remediation of a public resource exposure using the SFN SDK integration and Terraform.

Use Case: EBS Snapshot Made Public

All write events that are recorded by CloudTrail are also sent to the Default EventBridge Event bus. Read-only events are not currently sent to the default event bus and cannot trigger a workflow. Modifying a snapshot results in a write CloudTrail event, making this remediation workflow possible.

Modifying a snapshot to be public results in the following sample CloudTrail event:

{
  "input": {
    "version": "0",
    "id": "00000000-0000-0000-0000-000000000000",
    "detail-type": "AWS API Call via CloudTrail",
    "source": "aws.ec2",
    "account": "123456789012",
    "time": "2022-01-01T00:00:00Z",
    "region": "us-east-1",
    "resources": [],
    "detail": {
      "awsRegion": "us-east-1",
      "eventID": "1111",
      "eventName": "ModifySnapshotAttribute",
      "eventSource": "ec2.amazonaws.com",
      "eventTime": "2022-01-01T00:00:00Z",
      "eventType": "AwsApiCall",
      "eventVersion": "1.05",
      "recipientAccountId": "123456789012",
      "requestID": "1111",
      "requestParameters": {
          "attributeType": "CREATE_VOLUME_PERMISSION",
          "createVolumePermission": {
              "add": {
                  "items": [
                      {
                          "group": "all"
                      }
                  ]
              }
          },
          "snapshotId": "snap-1111"
      },
      "responseElements": {
          "_return": true,
          "requestId": "1111"
      },
      "sourceIPAddress": "111.111.111.111",
      "userAgent": "Mozilla/2.0 (compatible; NEWT ActiveX; Win32)",
      "userIdentity": {
          "accessKeyId": "1111",
          "accountId": "123456789012",
          "arn": "arn:aws:sts::123456789012:assumed-role/example-role/example-user",
          "principalId": "1111",
          "sessionContext": {
              "attributes": {
                  "creationDate": "2022-01-01T00:00:00Z",
                  "mfaAuthenticated": "true"
              },
              "sessionIssuer": {
                  "accountId": "123456789012",
                  "arn": "arn:aws:iam::123456789012:role/example-role",
                  "principalId": "1111",
                  "type": "Role",
                  "userName": "example-role"
              },
              "webIdFederationData": {}
          },
          "type": "AssumedRole"
        }
      }
  }
}

This sample event makes it a little easier to write an Event Pattern, but a pattern could also be created from the information in the API specification to match fields and values.

Sharing an EBS volume snapshot publicly modifies an existing snapshot with the ModifySnapshotAttribute API call. The “group”: “all” key-value pair is added through CreateVolumePermission during the API call.

This solution will monitor for invocations of this call and, if it meets the Event Rule criteria “group”: “all”, forward the event to a State Machine which will call the ModifySnapshotAttribute API to make the snapshot private again.

The Event to Step Function Workflow

Writing an Event Pattern to Match

This example will use Terraform to manage the Infrastructure as Code (IAC). Using Terraform will enable the ability to scale and deploy this solution in a predictable and repeatable state across multiple regions and AWS accounts.

Event Patterns have a few fields that are always present. Writing the CloudWatch Event Rule event_pattern will use keys from from the above CloudTrail event. The source will be the service the event originates from, aws.ec2, and detail-type, which is AWS API Call via CloudTrail.

In Terraform, the event rule and pattern is created as so:

resource "aws_cloudwatch_event_rule" "public_snapshot" {
  name        = "public-snapshot"
  description = "Capture events when Snapshots are made public"

  event_pattern = <<EOF
{
  "source": ["aws.ec2"],
  "detail-type": ["AWS API Call via CloudTrail"],
  "detail": {
    "eventSource": ["ec2.amazonaws.com"],
    "eventName": [
        "ModifySnapshotAttribute"
    ],
    "requestParameters": {
      "createVolumePermission": {
        "add": {
          "items":
            {
            "group": ["all"]
            }
        }
      }
    }
  }
}
EOF
}

Contained within the “detail” key is the CloudTrail log like shown above. Fields eventSource, and eventName help filter to the ec2:ModifySnapshotAttribute permission.

The requestParameters section does not have to explicitly match everything contained in the log, and in this case, it is looking to match “group”: “all”. When writing event patterns, the fields specified must match the event to trigger the rule.

Multiple values can be specified within square brackets ( “[ ]” ), comma delimited, and the pattern will match on an “OR” basis to any of the values within.

See Create Event Patterns in the AWS Documentation website for more information on the logic used to match events.

Triggering a State Machine

Step Function workflows can be invoked directly from Eventbridge event rules. The full event object is sent as the payload.

The event rule will require an IAM role to invoke the state machine. It requires a trust policy or “assume role policy” allowing events.amazonaws.com to assume the role and the action states:StartExecution on the yet to be created state machine. The state machine will be created later, but is easily referenced here.

resource "aws_iam_role" "event_public_snapshot" {
  name = "public-snapshot-events-role"

  inline_policy {
    name   = "public-snapshot-events-policy"
    policy = data.aws_iam_policy_document.events_policy.json

  }

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "events.amazonaws.com"
        }
      },
    ]
  })
}

data "aws_iam_policy_document" "events_policy" {
  statement {
    effect  = "Allow"
    actions = ["states:StartExecution"]
    resources = [aws_sfn_state_machine.public_snapshot.arn]
  }
}

An event target in Terraform requires only 4 things: the rule referenced, a name for the target, the IAM role created above, as well as the state machine ARN reference.

resource "aws_cloudwatch_event_target" "sfn" {
  rule      = aws_cloudwatch_event_rule.public_snapshot.name
  target_id = "public-snapshot-to-sfn"
  arn       = aws_sfn_state_machine.public_snapshot.arn
  role_arn  = aws_iam_role.event_public_snapshot.arn
}

Remediating the Public Snapshot

Like the event target, the state machine will need an IAM role with the appropriate permissions to make the snapshot private.

The state machines IAM role is similar to that of Events, this one with a states.amazonaws.com trust policy and the ec2:ModifySnapshotAttribute action. Resource requires a wildcard here as snapshot ARNs can be dynamically generated and the automation should encompass any snapshot created in the account.

resource "aws_iam_role" "sfn_public_snapshot" {
  name = "public-snapshot-sfn-role"

  inline_policy {
    name   = "public-snapshot-sfn-policy"
    policy = data.aws_iam_policy_document.sfn_policy.json

  }

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "states.amazonaws.com"
        }
      },
    ]
  })
}

data "aws_iam_policy_document" "sfn_policy" {
  statement {
    effect    = "Allow"
    actions   = ["ec2:ModifySnapshotAttribute"]
    resources = ["*"]
  }
}

The state machine itself is fairly simple. After it begins, it will run a single stage, which removes the public permission, and then exits.

The Step Function Workflow

Creating the definition of a state machine has a learning curve. Step Functions has a visual workflow designer in the AWS console that can help reduce some of the learning curve.

There are some more complicated things to tie tasks together, transforming outputs into inputs, and forking the logic, but none of these are necessary for this example. Another helpful resource for in-depth information on Amazon States Language is hosted at https://states-language.net/

resource "aws_sfn_state_machine" "public_snapshot" {
  name     = "public-snapshot"
  role_arn = aws_iam_role.sfn_public_snapshot.arn

  definition = <<EOF
{
  "Comment": "Removes group:all from snapshots",
  "StartAt": "RemoveAllPermission",
  "States": {
    "RemoveAllPermission": {
      "Type": "Task",
      "Resource": "arn:aws:states:::aws-sdk:ec2:modifySnapshotAttribute",
      "Parameters": {
        "SnapshotId.$": "$.detail.requestParameters.snapshotId",
        "CreateVolumePermission": {
            "Remove": [ { "Group": "all" } ]
        }
      },
      "End": true
    }
  }
}
EOF
}

Walking through the state machine definition, it begins with a comment about what the state machine’s purpose is.

StartAt refers to the starting point of the state machine and in this case, is the RemoveAllPermission task.

RemoveAllPermission uses an AWS SDK resource in the form of arn:aws:states:::aws-sdk:${SERVICE}:${API} . All of the currently available service integrations are listed in the AWS documentation.

The keys in parameters match those required by the API call, and the SnapshotId is retrieved from the CloudTrail Event in JSON format.

Deploying

The full Terraform code is hosted in a Github snippet and includes all code above. Paired with an AWS provider block, this example can be deployed for testing.

https://gist.github.com/dgwhited/ce2f3570f5f7e79b2477456a62b2db38

###########################
#### Eventbridge event ####
###########################

resource "aws_cloudwatch_event_rule" "public_snapshot" {
  name        = "public-snapshot"
  description = "Capture events when Snapshots are made public"

  event_pattern = <<EOF
{
  "source": ["aws.ec2"],
  "detail-type": ["AWS API Call via CloudTrail"],
  "detail": {
    "eventSource": ["ec2.amazonaws.com"],
    "eventName": [
        "ModifySnapshotAttribute"
    ],
    "requestParameters": {
      "createVolumePermission": {
        "add": {
          "items":
           {
            "group": ["all"]
           }
        }
      }
    }
  }
}
EOF
}

resource "aws_cloudwatch_event_target" "sfn" {
  rule      = aws_cloudwatch_event_rule.public_snapshot.name
  target_id = "public-snapshot-to-sfn"
  arn       = aws_sfn_state_machine.public_snapshot.arn
  role_arn  = aws_iam_role.event_public_snapshot.arn
}

###########################
##### Eventbridge IAM #####
###########################

resource "aws_iam_role" "event_public_snapshot" {
  name = "public-snapshot-events-role"

  inline_policy {
    name   = "public-snapshot-events-policy"
    policy = data.aws_iam_policy_document.events_policy.json

  }

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "events.amazonaws.com"
        }
      },
    ]
  })
}

data "aws_iam_policy_document" "events_policy" {
  statement {
    effect    = "Allow"
    actions   = ["states:StartExecution"]
    resources = [aws_sfn_state_machine.public_snapshot.arn]
  }
}

###########################
###### State Machine ######
###########################

resource "aws_sfn_state_machine" "public_snapshot" {
  name     = "public-snapshot"
  role_arn = aws_iam_role.sfn_public_snapshot.arn

  definition = <<EOF
{
  "Comment": "Removes group:all from snapshots",
  "StartAt": "RemoveAllPermission",
  "States": {
    "RemoveAllPermission": {
      "Type": "Task",
      "Resource": "arn:aws:states:::aws-sdk:ec2:modifySnapshotAttribute",
      "Parameters": {
        "SnapshotId.$": "$.detail.requestParameters.snapshotId",
        "CreateVolumePermission": {
            "Remove": [ { "Group": "all" } ]
        }
      },
      "End": true
    }
  }
}
EOF
}

resource "aws_iam_role" "sfn_public_snapshot" {
  name = "public-snapshot-sfn-role"

  inline_policy {
    name   = "public-snapshot-sfn-policy"
    policy = data.aws_iam_policy_document.sfn_policy.json

  }

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "states.amazonaws.com"
        }
      },
    ]
  })
}

data "aws_iam_policy_document" "sfn_policy" {
  statement {
    effect    = "Allow"
    actions   = ["ec2:ModifySnapshotAttribute"]
    resources = ["*"]
  }
}

Other Uses

This example showed a single task that took action and ended upon receiving a qualified event. Other helpful extensions could be notifying a slack channel or user or creating tickets that action was taken. The Step Functions AWS SDK integration is a very powerful tool that can be used to create reactive controls that maintain a secure baseline in the event a preventative control should fail.

The information presented in this article is accurate as of March 02, 2022.

If you have any questions, or would like to discuss this topic in more detail, feel free to contact us and we would be happy to schedule some time to chat about how Aquia can help you and your organization.

Categories

Security AWS IaC