Introduction
When large and highly regulated organisations shift workloads to the cloud, these environments come laden with requirements for robust oversight. Stakeholders have a ‘zealous’ focus on the requisite governance models, security infrastructure, and risk controls being associated with the cloud. At Sourced, we work closely with these organisations to meet these challenges and deploy robust frameworks to meet these requirements and mitigate risks – all while using AWS-available tools.
In this blog, Sourced consultants walk you through how to build such a tool that offers a bird-eye’s view of the organisation’s compliance posture and support exemptions, along with event-driven notifications to operations and infrastructure teams which trigger remediation efforts, using the AWS Compliance Suite and AWS Config.
AWS Public Sector Summit Online:
Forging organisational cloud compliance at scale with Aaron Lauer and Somnath Kapoor
Why are Governance and Compliance Rules Important?
The focus of nearly every organisation as they move to the cloud is to improve their experience and agility, and to increase the velocity of application release cycles so that they may be more adaptive and responsive to customer and internal business needs. Intuitively, this means that the application landscape will change at an increased pace, leading to an increased risk of sub-optimal configuration or misconfiguration – which in turn may result in additional potential vulnerabilities or open vectors of attack.
Ideally, if an organisation could continually assess, audit, and evaluate cloud resources, and detect an unexpected configuration change to the cloud footprint – against company policies and known good configurations – we could mitigate much of the risk of undetected attack vectors, whilst maintaining a high level of application release velocity.
Through working with customers in financial services and other highly regulated industries over the past decade, Sourced has iterated and built upon various patterns and approaches to cloud adoption and found that building a robust foundation for compliance is not mutually exclusive to providing agility to application teams. What we call the ‘Compliance Engine’ – a tool that continually monitors the AWS cloud footprint and application workloads – is one of the many tools which we have found to be effective in addressing common concerns.
The Compliance Suite
The compliance suite is a collection of cloud native microservices, referred to in this post as engines. If we were to take a closer look at dissecting the functional boundaries of each service, three main areas can be easily identified: Detection, Monitoring & Notification, and Chaos Engineering.
The Rules Engine: Detection
Config rules are essentially AWS Lambda functions that facilitate resource evaluations, its functionality mapped to the construct of a detective control. These Lambda functions can be triggered on-schedule or during a resource configuration change event, set as the recommended configuration option.
Designed to automate the deployment and management of these config rules across an organisation’s landing zone, the Rules Engine can be broken down to three main segments: The deployment specification file, Config rule specification files, and the Source code – all of which will eventually be managed as an AWS Lambda function. It can also scale along with the size of cloud adoption within an organisation accordingly.
Config rules parameters are entered into the config rule specification, while the deployment specification file is a declarative map of which config rules are to be deployed and to which accounts.
During compilation, the engine will dynamically generate CloudFormation templates for the config rule resources, package the source code for each config rule Lambda and generate a list of deployment actions from the deployment specification file. These items will form the deployment package, to be consumed by the deployment pipeline – henceforth referred to as pipeline.
During deployment, the pipeline reads the deployment actions and translates them into API calls, signalling the cloud formation service to orchestrate the generation of config rules resources and config rule Lambdas across specified accounts within the landing zone.
The Compliance Engine: Monitoring & Notification
Designed specifically to handle and manage compliance exemptions, the Compliance Engine – once certified – will be routing compliance events from the config rules and will be then ready to present the multi-faceted compliance posture of an enterprise. Still, there are three main areas of enquiry with regards to the deployment of the Compliance Engine.
They are:
- What happens if an application team needs a justifiable exemption for a failed compliance evaluation? How do we stop alerts to go downstream to the Security Operations Centre (SOC)?
- How do we evaluate that config rules are written correctly before allowing its results to flow downstream to the SOC?
- How do we present data during audit that demonstrates that our config rules and detective controls are enforced and adhered to?
These areas of concern, though valid, are easily remedied owing to the specific functions of the Compliance Engine, which are:
- Controls administration
- Exemption handling
- Monitoring and alerting
Controls administration is the process of mapping directive controls to config rules i.e., detective controls. This relationship is maintained in a relational database table that is the compliance database, and the table that stores the lookup is called the controls table.
First, when the router is configured to receive consolidated compliance events from the landing zone, two main compliance events are expected for each resource change:
- The Resource Configuration Item Change Event, which represents the change – What has changed on the resource?
- The Compliance Change Notification, which represents the result – What is the compliance evaluation result of the resource configuration change?
These Resource Configuration Item Change Events are stored on the router and into the resource table, which receives a Compliance Change Notification. These events will then be tagged with a directive control ID, should the configuration rule which performed the evaluation match another when it is cross-referenced against the control rules table.
The router is designed in a manner which only publishes compliance event evaluations which have been tagged with an associated directive control ID. This creates a live release “downstream alerting” process, whereby config rules can be evaluated before being published to the wider organisation.
Compliance exemptions handling is arguably the primary value-added feature of the compliance engine, as exemptions are time-bound and can be scoped to any combination of meta data associated with the resource.
Here is an example to better illustrate the versatility of the exemption systems: Suppose an application owner requires the exemption of an S3 bucket – a simple web service interface which is used to store and retrieve data from the web – which under the influence of a new config rule had become non-compliant. In this instance, instead of deactivating the entire system, they could file for a temporary exemption for that unique program crash – also known as a bucket ID. In the same vein, portfolio owners could apply an exemption for all S3 buckets in their production account, by filing an exemption using a combination of the account ID and resource type.
Exemptions are filed and stored in the compliance database under the exemptions table. Prior to publishing and storing compliance events, the router will first extract all related resource meta data and cross-check it against that within the resource table for matching exemptions. Should there be a valid exemption coupled with a ‘non-compliant’ compliance evaluation result, then exemption will be deemed compliant and stored as such.
There are advantages to storing resource changes and compliance events, mainly granting organisations the ability to construct a bird’s-eye view of their compliance posture, simply by tapping onto the compliance database. The following mock-ups are examples of how processed compliance data sets can be extracted and rearranged accordingly to form unique dashboards.
This first diagram represents compliance by application, with this view enabling application owners to react and remediate non-compliant deployments.
This second diagram represents an alternative view of the same dataset mapping compliance by controls, useful for security and audit teams to assess the adherence to the implementation of the control guardrails of the entire organisation.
The Canary Engine: Chaos Engineering
Perhaps the most interesting enabler of the exemption mechanism is the Canary Engine.
Just as Netflix sought to align their teams around the notion of infrastructure resilience by inducing failure, the Canary Engine is designed to ensure that the code of every moving piece within the compliance ecosystem is kept honest, through the introduction of rogue misconfigured resources into the system. These misconfigured resources operate by assessing if the resulting non-compliant events are handled as should be expected.
Running on a predetermined schedule, the Canary Engine applies for an exemption within the Compliance Engine for the deployment of Canary Resources – a composition of misconfigured resources represented by cloud formation templates, deployed to target accounts according to a test specification file, used to determine the canary coverage within each account.
Once the canaries are deployed, the config rules are expected to be automatically invoked, with the resultant compliance event routed to the Compliance Engine, subscribing the Canary Engine to downstream events henceforth. A canary run is deemed to have passed if the compliance event for its resources is evaluated to be compliant with exemption.
As a final step, prior to the expiry of the exemption, the engine will tear down the canary resources and restore the engine to its normal operating standards, ready for the next cycle of compliance measures.
Conclusion
Having initially been created for a major financial institution in ASEAN, this compliance suite is a trio of serverless applications which are ready to be rolled out to a client already equipped with a core automation pipeline from the Sourced Core Foundations Programme.
Rolling out the compliance suite provides numerous business values, primarily enabling the client’s cloud program to successfully launch within the compliance mandates of the organisation and the industry it operates within, leading to long-term cloud adoption and workload migration.
The compliance suite furnishes the client’s cloud platform with the necessary detective controls (Canary Engine), ensuring accuracy (Rules Engine), managing coverage, and allowing exemptions to rule via exemption management (Compliance Engine). This ultimately provides Cloud Security, Operations, and Development teams, as well as Application leads, with a bespoke single plane of glass to assess the compliance posture of their entire cloud environment.