Overcoming Azure DevOps Pipeline Challenges with Terraform for AKS Deployments

Day two of developing KubeConductor is officially done.

Today, I focused on configuring an Azure DevOps CI/CD pipeline to deploy an Azure Kubernetes Service (AKS) cluster using Terraform. I’ll touch on the hurdles encountered while provisioning the infrastructure and managing existing resources, as well as the steps taken to resolve them.

The Code

My public repo can he found [here].

aks-cluster.tf: This is the Terraform configuration file that provisions an Azure Kubernetes Service (AKS) cluster using the AzureRM provider.

# aks-cluster.tf
provider "azurerm" {
  features {}

  client_id       = var.client_id
  client_secret   = var.client_secret
  tenant_id       = var.tenant_id
  subscription_id = var.subscription_id
}


resource "azurerm_resource_group" "default" {
  name     = "terraform-aks-rg"
  location = "West US"
}

resource "azurerm_kubernetes_cluster" "default" {
  name                = "terraform-aks-cluster"
  location            = azurerm_resource_group.default.location
  resource_group_name = azurerm_resource_group.default.name
  dns_prefix          = "terraform-aks"

  default_node_pool {
    name            = "default"
    node_count      = 2
    vm_size         = "Standard_DS2_v2"
    os_disk_size_gb = 30
  }

  identity {
    type = "SystemAssigned"
  }

  role_based_access_control_enabled = true

  tags = {
    environment = "Development"
  }
}

variables.tf: This Terraform configuration file defines a set of variables used in the aks-cluster.tf file to dynamically configure the Azure resources

# variables.tf
variable "resource_group_name" {
  description = "Name of the Resource Group"
  default     = "terraform-aks-rg"
}

variable "client_id" {
  description = "The Client ID of the Service Principal"
}

variable "client_secret" {
  description = "The Client Secret of the Service Principal"
}

variable "tenant_id" {
  description = "The Tenant ID of the Azure Active Directory"
}

variable "subscription_id" {
  description = "The Subscription ID where the resources will be created"
}

versions.tf: This Terraform configuration file specifies the required Terraform and provider versions needed. It helps ensure compatibility and stability by enforcing versioning.

# versions.tf
terraform {
  required_version = ">= 0.14"
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = ">= 2.56"
    }
  }
}

outputs.tf: This Terraform configuration file defines output variables for the infrastructure. In my case, these outputs are used to display values after the resources have been created. This functionality is not implemented yet, however.

# outputs.tf
output "kubernetes_cluster_name" {
  value = azurerm_kubernetes_cluster.default.name
}

output "resource_group_name" {
  value = azurerm_resource_group.default.name
}

Initial Pipeline Configuration

My goal for today was to automate the provisioning of an AKS cluster using Terraform in an Azure DevOps pipeline. The basic setup included a self-hosted Ubuntu agent and a multi-step pipeline with the following stages:

Terraform Init: Initializes a new or existing Terraform configuration

Terraform Plan: Generates an execution plan to show what changes Terraform will make to your infrastructure.

Terraform Apply: Applies the changes required to reach the desired state of the configuration.

Below is a screenshot of the steps passing the CI/CD pipeline, which runs on a self-hosted Ubuntu machine.

Debugging Pipeline Environment Variables

From the beginning, I knew I wanted an application that mirrored real-world cloud infrastructure as closely as possible. This meant that I needed to be careful with how I handle secret values such as IDs, passwords, etc. I opted to use Azure Pipeline Group Variables to store the sensitive information relating to my cloud infrastructure. My CI/CD pipeline fetches these variables on runtime, eliminating the need to hard-code these values.

Terraform relies on a consistent mapping of environment variables for Azure authentication. I encountered errors with variables like client_id and subscription_id not being recognized correctly. To fix this issue, I standardized the variable mapping using the TF_VAR_ prefix to align Terraform’s environment variables (TF_VAR_client_id, TF_VAR_client_secret, TF_VAR_tenant_id, and TF_VAR_subscription_id) with Azure’s environment requirements (ARM_CLIENT_ID, ARM_CLIENT_SECRET, etc.).

Here is the updated yml to match those requirements:

# Updated azure-pipelines.yml with complete TF_VAR_ variable mapping

trigger:
  branches:
    include:
      - main

pool:
  name: "SelfHostedUbuntu"

variables:
  - group: Terraform-SP-Credentials

jobs:
  - job: "Deploy_AKS"
    displayName: "Provision AKS Cluster Using Terraform"
    steps:
      # Step 1: Checkout Code
      - checkout: self

      # Step 2: Verify Terraform Version
      - script: |
          terraform --version
        displayName: "Verify Installed Terraform Version"

      # Step 3: Terraform Init (Set Working Directory and Pass All Variables with TF_VAR_ Prefix)
      - script: |
          terraform init
        displayName: "Terraform Init"
        workingDirectory: $(System.DefaultWorkingDirectory)/terraform
        env:
          ARM_CLIENT_ID: $(appId)
          ARM_CLIENT_SECRET: $(password)
          ARM_TENANT_ID: $(tenant)
          ARM_SUBSCRIPTION_ID: $(AZURE_SUBSCRIPTION_ID)
          TF_VAR_client_id: $(appId) # Client ID
          TF_VAR_client_secret: $(password) # Client Secret
          TF_VAR_tenant_id: $(tenant) # Tenant ID
          TF_VAR_subscription_id: $(AZURE_SUBSCRIPTION_ID) # Subscription ID

      # Step 4: Terraform Plan (Pass All Variables with TF_VAR_ Prefix)
      - script: |
          terraform plan -out=tfplan
        displayName: "Terraform Plan"
        workingDirectory: $(System.DefaultWorkingDirectory)/terraform
        env:
          ARM_CLIENT_ID: $(appId)
          ARM_CLIENT_SECRET: $(password)
          ARM_TENANT_ID: $(tenant)
          ARM_SUBSCRIPTION_ID: $(AZURE_SUBSCRIPTION_ID)
          TF_VAR_client_id: $(appId)
          TF_VAR_client_secret: $(password)
          TF_VAR_tenant_id: $(tenant)
          TF_VAR_subscription_id: $(AZURE_SUBSCRIPTION_ID)
      # Step 5: Terraform Apply (Set Working Directory and Pass All Variables)
      - script: |
          terraform apply -auto-approve tfplan
        displayName: "Terraform Apply"
        workingDirectory: $(System.DefaultWorkingDirectory)/terraform
        env:
          ARM_CLIENT_ID: $(appId)
          ARM_CLIENT_SECRET: $(password)
          ARM_TENANT_ID: $(tenant)
          ARM_SUBSCRIPTION_ID: $(AZURE_SUBSCRIPTION_ID)
          TF_VAR_client_id: $(appId)
          TF_VAR_client_secret: $(password)
          TF_VAR_tenant_id: $(tenant)
          TF_VAR_subscription_id: $(AZURE_SUBSCRIPTION_ID)

Azure CLI Summary:

Here’s a list of Azure CLI commands I used today. Sometimes these were used for debugging, sanity checking, or they were just necessary for Terraform to work.

1. Login to Azure Using Service Principal:

Authenticates to Azure using Service Principal credentials (appId, password, and tenant).

az login --service-principal -u <appId> -p <password> --tenant <tenant>

2. Retrieve AKS Cluster Credentials:

Fetches the AKS cluster credentials and configures kubectl to use this cluster.

az aks get-credentials --resource-group <resource_group_name> --name <kubernetes_cluster_name>

3. Browse Kubernetes Dashboard:

Opens the Kubernetes dashboard for the specified AKS cluster in the Azure Portal.

az aks browse --resource-group <resource_group_name> --name <kubernetes_cluster_name>

4. Create Service Principal:

Creates an Azure Active Directory Service Principal for authentication in Terraform.

az ad sp create-for-rbac --skip-assignment

Takeaways

State Management Needs Improvement: The pipeline deploys the AKS cluster and pods successfully if the resource group does not exist. However, the pipeline fails since state management is not set up yet — this a task that requires configuring a remote state backend.

Streamlined Pipeline Steps: The final pipeline configuration effectively provisions the AKS cluster but needs fine-tuning for state management and validation.

Next Steps:

Implement a remote state backend to centralize state management.
Reintroduce verification and post-deployment steps (kubectl commands) in the pipeline.
Optimize the CI/CD flow to handle both new and existing infrastructure.

Finally, I would like to thank my incredibly hard-working agent — SelfHostedUbuntu! Look at him just chugging along and handling errors like a champ :,-)

Bye for now!