Introduction

For the past two and a half months, I've been deeply immersed in one of the most transformative infrastructure projects of my career: migrating our workloads from Amazon ECS to Kubernetes (EKS) and establishing a robust, modern CI/CD pipeline. This journey has been both challenging and rewarding, teaching me invaluable lessons about container orchestration, GitOps practices, and the power of declarative infrastructure.

In this article, I'll walk you through our complete CI/CD setup, covering everything from building container images with GitHub Actions to deploying applications using GitOps principles with ArgoCD and Helm Charts. Whether you're planning a similar migration or looking to modernize your deployment pipeline, I hope this serves as a practical guide.

The Motivation: Why Move from ECS to Kubernetes?

Before diving into the technical details, let me address the elephant in the room: why migrate from ECS to Kubernetes? While ECS is a solid container orchestration service, Kubernetes offers several advantages that made the migration worthwhile:

Portability: Kubernetes is cloud-agnostic, giving us flexibility to run workloads across different cloud providers
Ecosystem: A rich ecosystem of tools and operators (ArgoCD, KEDA, Prometheus, etc.)
Scalability: More granular control over resource allocation and scaling policies
GitOps: Native support for declarative deployments and GitOps workflows
Community: Extensive community support and a wealth of learning resources

The decision to migrate wasn't taken lightly, but the benefits have already started to materialize in terms of deployment velocity, observability, and operational flexibility.

Architecture Overview

Our CI/CD pipeline follows a modern GitOps approach with the following components:

GitHub Actions: Builds and pushes container images to ECR
Amazon ECR: Stores container images securely
GitOps Repository: Contains ArgoCD application definitions and Helm values
ArgoCD: Monitors the GitOps repository and syncs applications to Kubernetes
Helm Charts: Templates for Kubernetes resources with environment-specific values

┌─────────────┐     ┌───────────────┐     ┌─────────────┐     ┌─────────────┐
│   GitHub    │────▶│ GitHub Actions│────▶│  Amazon ECR │     │   ArgoCD    │
│  Repository │     │  (Build & Tag)│     │  (Registry) │     │  (GitOps)   │
└─────────────┘     └───────────────┘     └─────────────┘     └─────────────┘
                                                                    │
                                                                    ▼
                                                          ┌─────────────┐
                                                          │ Kubernetes  │
                                                          │   (EKS)     │
                                                          └─────────────┘
                                                                    ▲
                                                                    │
                                                          ┌─────────────┐
                                                          │ Helm Charts │
                                                          │  (Templates)│
                                                          └─────────────┘

Step 1: Building Container Images with GitHub Actions

The foundation of our CI/CD pipeline is GitHub Actions, which automates the build and push process for container images. Let me break down our workflow:

Build and Release Workflow

Our main workflow (build-push-release.yml) triggers on every push to the main branch and follows a sophisticated versioning strategy:

Key Features:

Semantic Versioning: Automatically determines version bumps (major, minor, patch) based on commit messages or PR labels
Smart Tagging: Creates SHA-based tags first, then adds version tags without rebuilding if the container build is successful
Release Automation: Creates Git tags and GitHub releases automatically

Here's a simplified version of how the workflow operates:

name: Build & Release

on:
  push:
    branches:
      - main

jobs:
  build-release:
    runs-on: ubuntu-latest
    steps:
      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v5
        with:
          role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
          aws-region: ${{ vars.AWS_REGION }}

      - name: Log in to Amazon ECR
        uses: aws-actions/amazon-ecr-login@v2

      - name: Build and push with SHA tag
        uses: docker/build-push-action@v6
        with:
          push: true
          tags: |
            ${{ steps.ecr-login.outputs.registry }}/matters-ai/service-a:sha-${{ github.sha }}
          cache-from: type=gha
          cache-to: type=gha,mode=max

Version Detection Strategy:

The workflow intelligently determines version bumps:

Major: Breaking changes (commit messages with BREAKING CHANGE: or feat!:)
Minor: New features (feat: prefix)
Patch: Bug fixes and other changes
PR Labels: Can override with release:major, release:minor, or release:patch

After building with a SHA tag, the workflow adds version tags to the same image without rebuilding, which is both efficient and ensures consistency.

Deployment Workflow

For deploying to development environments, we have a separate workflow (deploy-dev.yml) that supports two deployment strategies:

Branch Deployments: Builds and deploys from feature/hotfix/bugfix branches
Release Deployments: Deploys a specific released version

The branch deployment workflow:

Validates branch naming conventions
Checks if the image already exists in ECR (skips build if present)
Updates the GitOps repository with the new image tag
Triggers ArgoCD sync automatically

Step 2: Container Registry with Amazon ECR

Amazon ECR serves as our primary container registry, providing secure, scalable image storage. We leverage ECR's features:

Image Scanning: Automatic vulnerability scanning
Lifecycle Policies: Automatic cleanup of old images
IAM Integration: Fine-grained access control using IAM roles
Cross-Region Replication: For disaster recovery and latency optimization

Our GitHub Actions workflows authenticate to ECR using OIDC (OpenID Connect), eliminating the need to store long-lived credentials. This is a security best practice that I highly recommend.

Step 3: GitOps with ArgoCD

ArgoCD is the heart of our GitOps implementation. It continuously monitors our GitOps repository and ensures that the Kubernetes cluster state matches the desired state defined in Git.

ArgoCD Application Structure

Our GitOps repository (gitops-argocd) follows a hierarchical structure:

gitops-argocd/
├── argocd/
│   ├── applications/
│   │   ├── dev/
│   │   │   ├── service-a.yaml
│   │   │   ├── service-b.yaml
│   │   │   └── ...
│   │   ├── staging/
│   │   ├── prod/
│   │   └── root-*-app.yaml
│   └── projects/
│       ├── dev.yaml
│       ├── staging.yaml
│       └── prod.yaml
└── apps/
    ├── dev/
    │   └── values/
    │       ├── service-a.yaml
    │       └── ...
    └── ...

Root Applications

We use ArgoCD's Application of Applications pattern, where root applications manage other applications:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: root-apps-dev
  namespace: argocd
spec:
  project: dev
  source:
    repoURL: git@github.com:matters-ai/gitops-argocd.git
    targetRevision: main
    path: argocd/applications/dev
  destination:
    server: https://kubernetes.default.svc
    namespace: dev
  syncPolicy:
    automated:
      prune: true
      selfHeal: true

Key Features:

Automated Sync: Changes in Git are automatically synced to the cluster
Self-Healing: ArgoCD automatically corrects manual changes to the cluster
Pruning: Removes resources that are no longer in Git

Application Definitions

Each application references a Helm chart and environment-specific values:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: service-a
  namespace: argocd
spec:
  project: dev
  sources:
  - repoURL: 'git@github.com:matters-ai/helm-charts.git'
    path: service-a
    targetRevision: main
    helm:
      valueFiles:
      - $values/apps/dev/values/service-a.yaml
  - repoURL: 'git@github.com:matters-ai/gitops-argocd.git'
    targetRevision: main
    ref: values
  destination:
    server: https://kubernetes.default.svc
    namespace: service-a
  syncPolicy:
    automated:
      prune: true
      selfHeal: true

This setup allows us to:

Separate Helm chart templates from environment-specific values
Use ArgoCD's multi-source feature to reference values from a different repository
Maintain a clear separation of concerns

Step 4: Helm Charts for Kubernetes Deployments

Helm charts provide a templated approach to defining Kubernetes resources. Our Helm charts are stored in a separate repository (helm-charts) and follow best practices:

Chart Structure

service-a/
├── Chart.yaml
├── values.yaml
└── templates/
    ├── deployment-api.yaml
    ├── deployment-workers.yaml
    ├── service.yaml
    ├── ingress.yaml
    ├── hpa.yaml
    ├── pdb.yaml
    ├── serviceaccount.yaml
    └── _helpers.tpl

Key Features of Our Helm Charts

Multi-Component Support: Our service charts can support both API servers and queue workers, each with independent configurations
Environment-Specific Values: Values files in the GitOps repository override defaults:

# apps/dev/values/service-a.yaml
image:
  repository: "123456789012.dkr.ecr.ap-south-1.amazonaws.com/matters-ai/service-a"
  tag: "v1.2.3"
  pullPolicy: IfNotPresent

api:
  enabled: true
  replicaCount: 1
  autoscaling:
    enabled: true
    minReplicas: 1
    maxReplicas: 3
    targetCPUUtilizationPercentage: 80

Flexible Configuration: Support for ConfigMaps, Secrets, environment variables, and volume mounts
Resource Management: CPU and memory limits/requests for proper resource allocation
Autoscaling: Built-in support for Horizontal Pod Autoscaling (HPA) and KEDA for event-driven scaling

Deployment Template Example

Our deployment templates are comprehensive, supporting various Kubernetes features:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ include "service-a.fullname" . }}-api
spec:
  replicas: {{ .Values.api.replicaCount }}
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  template:
    spec:
      serviceAccountName: {{ include "service-a.serviceAccountName" . }}
      containers:
        - name: {{ .Chart.Name }}-api
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
          imagePullPolicy: {{ .Values.image.pullPolicy }}
          resources:
            {{- toYaml .Values.api.resources | nindent 12 }}

The Complete Deployment Flow

Let me walk you through what happens when we deploy a new version:

Developer pushes code to a feature branch or merges to main
GitHub Actions triggers the build workflow
Docker image is built and pushed to ECR with a SHA tag
Version is determined based on commit messages or PR labels
Version tag is added to the existing image (no rebuild)
Git tag and GitHub release are created automatically
Deployment workflow (manual or automated) updates the GitOps repository
ArgoCD detects the change in the GitOps repository
ArgoCD syncs the new image tag to the Kubernetes cluster
Helm renders the Kubernetes manifests with the new image
Kubernetes performs a rolling update of the deployment

This entire process is automated, repeatable, and auditable through Git history.

Challenges and Learnings

No migration of this scale is without challenges. Here are some key learnings:

1. Image Tagging Strategy

Initially, we struggled with image tagging. We learned that:

SHA-based tags are great for traceability
Version tags are essential for releases
Tagging without rebuilding (using manifests) saves time and ensures consistency

2. GitOps Repository Management

Managing the GitOps repository required discipline:

All changes must go through Git (no manual kubectl edits)
Clear commit messages help with debugging
Separate repositories for charts and values improve maintainability

3. Helm Chart Complexity

As our applications grew, Helm charts became complex:

Use _helpers.tpl for reusable template logic
Keep values.yaml well-documented
Version your charts properly

4. ArgoCD Sync Policies

Finding the right balance for sync policies:

Automated sync is great for dev/staging
Manual sync might be preferred for production
Self-healing is powerful but can be surprising if not understood

5. Resource Management

Kubernetes resource management is more granular than ECS:

Proper resource requests/limits are crucial
HPA requires careful tuning
Node affinity and taints help with workload placement

Best Practices We Follow

Immutable Infrastructure: All changes go through Git, no manual edits
Environment Parity: Same Helm charts, different values files
Security: OIDC for authentication, least-privilege IAM roles
Observability: Comprehensive logging and monitoring
Documentation: Clear READMEs and inline comments
Testing: Test Helm charts with different values before deploying

What's Next?

Our CI/CD journey continues to evolve. Some areas we're exploring:

Progressive Delivery: Using Argo Rollouts for canary and blue-green deployments
Policy as Code: Implementing OPA (Open Policy Agent) for governance
Multi-Cluster Management: Managing multiple EKS clusters with ArgoCD
Cost Optimization: Better resource utilization and spot instance integration

Conclusion

Migrating from ECS to Kubernetes and establishing a modern CI/CD pipeline has been a transformative experience. The combination of GitHub Actions, ECR, ArgoCD, and Helm Charts has given us:

Faster deployments: Automated pipeline reduces manual steps
Better visibility: Git-based history of all changes
Improved reliability: Automated testing and rollback capabilities
Enhanced security: Immutable infrastructure and proper access controls
Greater flexibility: Easy to add new environments or services

While the initial setup required significant effort, the long-term benefits in terms of developer productivity, operational efficiency, and system reliability make it well worth it.

If you're considering a similar migration, I'd recommend:

Start small with a non-critical service
Invest time in understanding GitOps principles
Document everything as you go
Involve your team early in the process
Be patient—migrations take time

I hope this article provides valuable insights for your own CI/CD journey. If you have recently migrated to Kubernetes or implemented GitOps, I'd love to hear about your experiences and learnings. Feel free to share your thoughts or reach out if you'd like to discuss any aspect of this setup in more detail.