[Webinar] Build Your GenAI Stack with Confluent and AWS | Register Now

Self-Service GitOps for Confluent Cloud

Écrit par

With over 3 million downloads since its release in June of 2022, Confluent’s Terraform provider has become integral to continuous integration (CI) pipelines in enterprises everywhere. HashiCorp’s Terraform enables changes to be made to Confluent infrastructure via declarative merge requests to a central code repository, which can then be automatically deployed. This approach is often referred to as GitOps, and has many benefits, such as enhanced security, improved collaboration, and increased operational efficiency.

When automatically deploying infrastructure changes, it’s important to ensure that any changes follow your organization’s requirements. There may be compliance regulations that require clusters to be created only in certain regions or cloud providers. For auditing purposes, the names of resources may need to follow a particular naming convention. For security reasons, access should be restricted to service accounts, rather than user accounts. Policy-as-code can help protect against deploying changes that violate an organization's security and auditing requirements.

Policy-as-code is the practice of permitting or preventing actions based on rules and conditions defined in code. In the context of GitOps for Confluent, suitable policies may include restricting which cloud providers can host Confluent Kafka clusters, and enforcing naming standards for various resources. Validation is performed for every proposed change made to Confluent resources to ensure that the data streaming infrastructure deployments meet the policy requirement.

This blog post describes how GitOps can work with policy-as-code systems to provide a true self-service model for managing Confluent resources.

GitOps

GitOps is a modern IT practice which emphasizes the use of Git for infrastructure and application management. Git, the version control system, provides a trail of changes to a system, describing who, what, and when (and with appropriate commentary, one could also answer why). Once changes are merged into a Git repository, a pipeline is automatically run to ensure the changes are realized in the actual platform automatically. 

There are two main phases when applying a change with Terraform: the plan phase and the apply phase. During the plan phase, the configuration in Git is compared with the actual environment, and a plan is automatically created to address the differences between the configuration and reality. Policy enforcement, provided by our new policy-as-code integration, is integrated as part of the initial planning phase. Finally, in the apply phase, a human operator can review the plan and either apply it or reject it. 

Applying policy-as-code

The Terraform plan phase generates a plan file represented in JSON format. For example, the following JSON Terraform plan is produced when you want to create a new cluster (abbreviated for clarity).

{
  "format_version": "0.2",
  "terraform_version": "1.0.9",
  "resource_changes": [
    {
      "address": "confluent_kafka_cluster.basic",
      "mode": "managed",
      "type": "confluent_kafka_cluster",
      "name": "basic",
      "change": {
      "actions": [
        "create"
      ],
      "after": {
        "availability": "SINGLE_ZONE",
        "cloud": "AWS",
        "basic": [{}],
        "dedicated": [],
        "standard": [],
        "display_name": "basic_kafka_cluster",
        "environment": [{}],
        "region": "ap-southeast-2",
        "timeouts": null
      }
    }
  ]
}

This JSON snippet shows a new Kafka cluster created in the AWS cloud, in a single zone in the ap-southeast-2 region. The policy-as-code enforcement reads this JSON and validates the changes against established policies. 

The following is an example of a policy (written in HashiCorp’s Sentinel) that permits new cluster creation only in AWS cloud.

# List of approved clouds
param approved_clouds default [ "AWS" ]

# Get a list of all newly created clusters
all_new_clusters = filter tfplan.resource_changes as _, resource_changes {
  "create" in resource_changes.change.actions and
  resource_changes.type is "confluent_kafka_cluster" and
  resource_changes.mode is "managed"
}

# Filter this list to all clusters that match the approved clouds
valid_clusters = filter all_new_clusters as _, cluster {
  cluster.change.after.cloud in approved_clouds
}

# Ensure all new clusters are in valid cloud providers
main = rule {
  length(valid_clusters) == length(all_new_clusters)
}

If the proposed plan meets all of the policies, then Terraform can move on to the apply phase and automatically create the cluster. Using any other cloud provider will cause a policy violation, triggering several potential actions:

  • The system can note the policy violation, but continue and apply the change. This approach is called “advisory” and is most often used when starting out with policy-as-code. It does not prevent or delay operations, but provides guidance that should be followed.

  • The system stops the change but allows an operator to override and apply it anyway. This approach is called “soft-mandatory” and is often used once some familiarity with policies are established.

  • The system stops the change but the operator can not override it. This approach is called “hard-mandatory,”’ and is often used in very mature environments that have well-tested policies.

By utilizing GitOps and policy-as-code, users can propose changes to their Confluent environments with the confidence that they won’t accidentally violate important security, regulatory, or auditing requirements. 

Confluent Policy Library

To help you get started with policy-as-code for Confluent, we have put together a collection of policies that you can easily adapt to fit your needs. These policies are available from our Git repository, as well as the HashiCorp Registry. These policies are written in HashiCorp Sentinel, which is compatible with Terraform Cloud and Terraform Enterprise.

Some of the policies include:

  • Enforce cluster provisioning only in specified cloud providers and regions

  • Enforce naming standards for service accounts

  • Enforce and limit RBAC role assignments, preventing the creation of users with full administrative capabilities

  • Enforce resource creation and scaling limitations

  • Enforce topic naming standards

  • Prevent topics from being deleted

What’s next?

We’re going to continue to improve the Confluent Terraform provider with new features and resource management options. At the same time, we’re also going to continue to add new sample policies to the library. We also aim to support Open Policy Agent, another common policy-as-code language, and we will develop policies in that format as well.

But most importantly, we are looking to hear your feedback on our policy library. Do these policies work well in your environment? Do you have ideas for other policies? Feel free to leave your ideas on our Git repository or talk to us in the #terraform channel on Confluent Community Slack.

Simon Duff is a Confluent Policy Library contributor and maintainer. 

  • Simon Duff is a creative and inquisitive IT professional, with a master’s degree in artificial intelligence. He has extensive experience in a variety of industries, having worked 7 years for Splunk Professional Services, and 2 years at Confluent, developing cutting-edge solutions for clients. Prior to this, Simon was an InfoSec professional and penetration tester.

Avez-vous aimé cet article de blog ? Partagez-le !