[AEWS 2기] EKS AutoScaling

JUNE . · 2024. 4. 1. 23:51 DevOps Study/Kubernetes ·

EKS 스터디 CloudNet@팀의 AEWS 2기에 작성된 자료를 베이스로 작성된 블로깅입니다.

환경 배포

https://ap-northeast-2.console.aws.amazon.com/cloudformation/home?region=ap-northeast-2#/stacks/create?stackName=myeks&templateURL=https:%2F%2Fs3.ap-northeast-2.amazonaws.com%2Fcloudformation.cloudneta.net%2FK8S%2Feks-oneclick4.yaml

이렇게 yaml파일에 oneclick4인지 확인하세요

만들어진 스택들은 cloudformation에서 다음과같이 확인이 가능합니다.

이번에 사용하는 워커노드들은 다시 t3.medium 클래스들을 사용합니다.

배포가 완료되었으면 이렇게 bastion Host 인스턴스 하나와 node group으로 인해 만들어진 t3.xlarge 클래스 인스턴스들이 세개가 확인 가능합니다.

환경설정

# default 네임 스페이스 적용 
kubectl ns default

이젠 너무 익숙하네요

bastion host 인스턴스에 ssh접속을 해서 다음의 커멘드로 환경 셋팅을 해줍니다.

AWS LB/ExternalDNS/EBS, kube-ops-view 설치

# ExternalDNS
MyDomain=crowsnest.click
echo "export MyDomain=crowsnest.click" >> /etc/profile

MyDnzHostedZoneId=$(aws route53 list-hosted-zones-by-name --dns-name "${MyDomain}." --query "HostedZones[0].Id" --output text)
echo $MyDomain, $MyDnzHostedZoneId
curl -s -O https://raw.githubusercontent.com/gasida/PKOS/main/aews/externaldns.yaml
MyDomain=$MyDomain MyDnzHostedZoneId=$MyDnzHostedZoneId envsubst < externaldns.yaml | kubectl apply -f -

# kube-ops-view
helm repo add geek-cookbook https://geek-cookbook.github.io/charts/
helm install kube-ops-view geek-cookbook/kube-ops-view --version 1.2.2 --set env.TZ="Asia/Seoul" --namespace kube-system
kubectl patch svc -n kube-system kube-ops-view -p '{"spec":{"type":"LoadBalancer"}}'
kubectl annotate service kube-ops-view -n kube-system "external-dns.alpha.kubernetes.io/hostname=kubeopsview.$MyDomain"
echo -e "Kube Ops View URL = http://kubeopsview.$MyDomain:8080/#scale=1.5"

# AWS LB Controller
helm repo add eks https://aws.github.io/eks-charts
helm repo update
helm install aws-load-balancer-controller eks/aws-load-balancer-controller -n kube-system --set clusterName=$CLUSTER_NAME \
  --set serviceAccount.create=false --set serviceAccount.name=aws-load-balancer-controller

kubeops view가 호스팅되어서 접속 가능한것을 확인하실 수 있습니다.

# EBS csi driver 설치 확인
eksctl get addon --cluster ${CLUSTER_NAME}
kubectl get pod -n kube-system -l 'app in (ebs-csi-controller,ebs-csi-node)'
kubectl get csinodes

# gp3 스토리지 클래스 생성
kubectl get sc
kubectl apply -f https://raw.githubusercontent.com/gasida/PKOS/main/aews/gp3-sc.yaml
kubectl get sc

EBS csi driver 와 gp3 스토리지 클래스도 생성해줍니다.

프로메테우스 & 그라파나(admin / prom-operator) 설치 : 대시보드 추천 15757 17900 15172

# 사용 리전의 인증서 ARN 확인
CERT_ARN=`aws acm list-certificates --query 'CertificateSummaryList[].CertificateArn[]' --output text`
echo $CERT_ARN

# repo 추가
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

# 파라미터 파일 생성 : PV/PVC(AWS EBS) 삭제에 불편하니, 4주차 실습과 다르게 PV/PVC 미사용
cat <<EOT > monitor-values.yaml
prometheus:
  prometheusSpec:
    podMonitorSelectorNilUsesHelmValues: false
    serviceMonitorSelectorNilUsesHelmValues: false
    retention: 5d
    retentionSize: "10GiB"

  verticalPodAutoscaler:
    enabled: true

  ingress:
    enabled: true
    ingressClassName: alb
    hosts: 
      - prometheus.$MyDomain
    paths: 
      - /*
    annotations:
      alb.ingress.kubernetes.io/scheme: internet-facing
      alb.ingress.kubernetes.io/target-type: ip
      alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}, {"HTTP":80}]'
      alb.ingress.kubernetes.io/certificate-arn: $CERT_ARN
      alb.ingress.kubernetes.io/success-codes: 200-399
      alb.ingress.kubernetes.io/load-balancer-name: myeks-ingress-alb
      alb.ingress.kubernetes.io/group.name: study
      alb.ingress.kubernetes.io/ssl-redirect: '443'

grafana:
  defaultDashboardsTimezone: Asia/Seoul
  adminPassword: prom-operator
  defaultDashboardsEnabled: false

  ingress:
    enabled: true
    ingressClassName: alb
    hosts: 
      - grafana.$MyDomain
    paths: 
      - /*
    annotations:
      alb.ingress.kubernetes.io/scheme: internet-facing
      alb.ingress.kubernetes.io/target-type: ip
      alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}, {"HTTP":80}]'
      alb.ingress.kubernetes.io/certificate-arn: $CERT_ARN
      alb.ingress.kubernetes.io/success-codes: 200-399
      alb.ingress.kubernetes.io/load-balancer-name: myeks-ingress-alb
      alb.ingress.kubernetes.io/group.name: study
      alb.ingress.kubernetes.io/ssl-redirect: '443'

kube-state-metrics:
  rbac:
    extraRules:
      - apiGroups: ["autoscaling.k8s.io"]
        resources: ["verticalpodautoscalers"]
        verbs: ["list", "watch"]
  prometheus:
    monitor:
      enabled: true
  customResourceState:
    enabled: true
    config:
      kind: CustomResourceStateMetrics
      spec:
        resources:
          - groupVersionKind:
              group: autoscaling.k8s.io
              kind: "VerticalPodAutoscaler"
              version: "v1"
            labelsFromPath:
              verticalpodautoscaler: [metadata, name]
              namespace: [metadata, namespace]
              target_api_version: [apiVersion]
              target_kind: [spec, targetRef, kind]
              target_name: [spec, targetRef, name]
            metrics:
              - name: "vpa_containerrecommendations_target"
                help: "VPA container recommendations for memory."
                each:
                  type: Gauge
                  gauge:
                    path: [status, recommendation, containerRecommendations]
                    valueFrom: [target, memory]
                    labelsFromPath:
                      container: [containerName]
                commonLabels:
                  resource: "memory"
                  unit: "byte"
              - name: "vpa_containerrecommendations_target"
                help: "VPA container recommendations for cpu."
                each:
                  type: Gauge
                  gauge:
                    path: [status, recommendation, containerRecommendations]
                    valueFrom: [target, cpu]
                    labelsFromPath:
                      container: [containerName]
                commonLabels:
                  resource: "cpu"
                  unit: "core"
  selfMonitor:
    enabled: true

alertmanager:
  enabled: false
EOT
cat monitor-values.yaml | yh

# 배포
kubectl create ns monitoring
helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack --version 57.2.0 \
--set prometheus.prometheusSpec.scrapeInterval='15s' --set prometheus.prometheusSpec.evaluationInterval='15s' \
-f monitor-values.yaml --namespace monitoring
# 배포
kubectl create ns monitoring
helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack --version 57.2.0 \
--set prometheus.prometheusSpec.scrapeInterval='15s' --set prometheus.prometheusSpec.evaluationInterval='15s' \
-f monitor-values.yaml --namespace monitoring

# Metrics-server 배포
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# 프로메테우스 ingress 도메인으로 웹 접속
echo -e "Prometheus Web URL = https://prometheus.$MyDomain"

# 그라파나 웹 접속 : 기본 계정 - admin / prom-operator
echo -e "Grafana Web URL = https://grafana.$MyDomain"

Prometheus

https://prometheus.crowsnest.click

Grafana

https://grafana.crowsnest.click

각각의 도메인으로 접속하니 잘 접속되는 것 확인 가능합니다.

15757번 & 프로메테우스를 소스로 사용해서 대시보드를 구성했습니다.

EKS Node Viewer 설치 : 노드 할당 가능 용량과 요청 request 리소스 표시, 실제 파드 리소스 사용량 X - 링크

# go 설치
wget https://go.dev/dl/go1.22.1.linux-amd64.tar.gz
tar -C /usr/local -xzf go1.22.1.linux-amd64.tar.gz
export PATH=$PATH:/usr/local/go/bin
go version
go version go1.22.1 linux/amd64

# EKS Node Viewer 설치 : 약 2분 이상 소요
go install github.com/awslabs/eks-node-viewer/cmd/eks-node-viewer@latest

# [신규 터미널] EKS Node Viewer 접속
cd ~/go/bin && ./eks-node-viewer
혹은
cd ~/go/bin && ./eks-node-viewer --resources cpu,memory

명령 샘플
# Standard usage
./eks-node-viewer

# Display both CPU and Memory Usage
./eks-node-viewer --resources cpu,memory

# Karenter nodes only
./eks-node-viewer --node-selector "karpenter.sh/provisioner-name"

# Display extra labels, i.e. AZ
./eks-node-viewer --extra-labels topology.kubernetes.io/zone

# Specify a particular AWS profile and region
AWS_PROFILE=myprofile AWS_REGION=us-west-2

기본 옵션
# select only Karpenter managed nodes
node-selector=karpenter.sh/provisioner-name

# display both CPU and memory
resources=cpu,memory

Go를 설치한 후 Node viewer를 설치합니다.

cd ~/go/bin && ./eks-node-viewer

설치가 다 되면 위 명령을해서 노드의 클래스와 cpu 온디맨드타입 비용을 확인할 수 있습니다.

cd ~/go/bin && ./eks-node-viewer --resources cpu,memory

위 명령을해서 노드의 클래스와 cpu 메모리 타입 비용을 확인할 수 있습니다.

cd ~/go/bin && ./eks-node-viewer --extra-labels topology.kubernetes.io/zone

가용영역까지 확인이 가능합니다.

Kubernetes autoscaling overview

https://kimalarm.tistory.com/ 1기분의 블로그를 참고했습니다. 감사합니다.

Kubernetes 에서 기본으로 제공하는 Pod AutoScaling 기능으로는 HPA, VPA 가 존재합니다.
CSP(Cloud Service Provider)의 Managed Kubernetes 서비스를 사용하면Node AutoScaling 기능인 CA 를 사용할 수 있습니다.

다음 3가지를 가장 기본적인 Kubernetes AutoScaling 이라고 합니다.

HPA (Horizontal Pod Autoscaling)

수평적 파드 오토스케일링 기능입니다.

Minimum - Maximum Pod 개수를 정하고, 현재 EKS 사용 메트릭에 따라 Pod 를 유연하게 배포할 수 있습니다.

[Scale In - Out 기능]

이미지 출처: https://aws.amazon.com/blogs/opensource/horizontal-pod-autoscaling-eks/

EKS 에서 HPA 는 HorizontalPodAutoscaler 를 통해 구현할 수 있습니다.
HorizontalPodAutoscaler는 배포, 복제 컨트롤러 또는 복제본 집합에 있는 pods의 수를 해당 리소스의 CPU 사용률에 따라 자동으로 조정합니다.

17125 대시보드 그라파나 배포

{
  "__inputs": [],
  "__requires": [
    {
      "type": "grafana",
      "id": "grafana",
      "name": "Grafana",
      "version": "6.1.6"
    },
    {
      "type": "panel",
      "id": "graph",
      "name": "Graph",
      "version": ""
    },
    {
      "type": "datasource",
      "id": "prometheus",
      "name": "Prometheus",
      "version": "1.0.0"
    },
    {
      "type": "panel",
      "id": "singlestat",
      "name": "Singlestat",
      "version": ""
    }
  ],
  "annotations": {
    "list": [
      {
        "builtIn": 1,
        "datasource": "-- Grafana --",
        "enable": true,
        "hide": true,
        "iconColor": "rgba(0, 211, 255, 1)",
        "name": "Annotations & Alerts",
        "type": "dashboard"
      }
    ]
  },
  "editable": true,
  "gnetId": 17125,
  "graphTooltip": 0,
  "id": null,
  "iteration": 1558717029334,
  "links": [],
  "panels": [
    {
      "cacheTimeout": null,
      "colorBackground": false,
      "colorValue": false,
      "colors": [
        "#299c46",
        "rgba(237, 129, 40, 0.89)",
        "#d44a3a"
      ],
      "datasource": "$datasource",
      "format": "none",
      "gauge": {
        "maxValue": 100,
        "minValue": 0,
        "show": false,
        "thresholdLabels": false,
        "thresholdMarkers": true
      },
      "id": 5,
      "interval": null,
      "links": [],
      "mappingType": 1,
      "mappingTypes": [
        {
          "name": "value to text",
          "value": 1
        },
        {
          "name": "range to text",
          "value": 2
        }
      ],
      "maxDataPoints": 100,
      "nullPointMode": "connected",
      "nullText": null,
      "postfix": "",
      "postfixFontSize": "50%",
      "prefix": "",
      "prefixFontSize": "50%",
      "rangeMaps": [
        {
          "from": "null",
          "text": "N/A",
          "to": "null"
        }
      ],
      "sparkline": {
        "fillColor": "rgba(31, 118, 189, 0.18)",
        "full": false,
        "lineColor": "rgb(31, 120, 193)",
        "show": true
      },
      "tableColumn": "",
      "targets": [
        {
          "expr": "kube_horizontalpodautoscaler_status_desired_replicas{job=\"kube-state-metrics\", namespace=\"$namespace\"}",
          "format": "time_series",
          "intervalFactor": 2,
          "legendFormat": "",
          "refId": "A"
        }
      ],
      "thresholds": "",
      "title": "Desired Replicas",
      "type": "singlestat",
      "valueFontSize": "80%",
      "valueMaps": [
        {
          "op": "=",
          "text": "0",
          "value": "null"
        }
      ],
      "valueName": "current"
    },
    {
      "cacheTimeout": null,
      "colorBackground": false,
      "colorValue": false,
      "colors": [
        "#299c46",
        "rgba(237, 129, 40, 0.89)",
        "#d44a3a"
      ],
      "datasource": "$datasource",
      "format": "none",
      "gauge": {
        "maxValue": 100,
        "minValue": 0,
        "show": false,
        "thresholdLabels": false,
        "thresholdMarkers": true
      },
      "gridPos": {
        "h": 3,
        "w": 6,
        "x": 6,
        "y": 0
      },
      "id": 6,
      "interval": null,
      "links": [],
      "mappingType": 1,
      "mappingTypes": [
        {
          "name": "value to text",
          "value": 1
        },
        {
          "name": "range to text",
          "value": 2
        }
      ],
      "maxDataPoints": 100,
      "nullPointMode": "connected",
      "nullText": null,
      "postfix": "",
      "postfixFontSize": "50%",
      "prefix": "",
      "prefixFontSize": "50%",
      "rangeMaps": [
        {
          "from": "null",
          "text": "N/A",
          "to": "null"
        }
      ],
      "sparkline": {
        "fillColor": "rgba(31, 118, 189, 0.18)",
        "full": false,
        "lineColor": "rgb(31, 120, 193)",
        "show": true
      },
      "tableColumn": "",
      "targets": [
        {
          "expr": "kube_horizontalpodautoscaler_status_current_replicas{job=\"kube-state-metrics\", namespace=\"$namespace\"}",
          "format": "time_series",
          "intervalFactor": 2,
          "legendFormat": "",
          "refId": "A"
        }
      ],
      "thresholds": "",
      "title": "Current Replicas",
      "type": "singlestat",
      "valueFontSize": "80%",
      "valueMaps": [
        {
          "op": "=",
          "text": "0",
          "value": "null"
        }
      ],
      "valueName": "current"
    },
    {
      "cacheTimeout": null,
      "colorBackground": false,
      "colorValue": false,
      "colors": [
        "#299c46",
        "rgba(237, 129, 40, 0.89)",
        "#d44a3a"
      ],
      "datasource": "$datasource",
      "format": "none",
      "gauge": {
        "maxValue": 100,
        "minValue": 0,
        "show": false,
        "thresholdLabels": false,
        "thresholdMarkers": true
      },
      "gridPos": {
        "h": 3,
        "w": 6,
        "x": 12,
        "y": 0
      },
      "id": 7,
      "interval": null,
      "links": [],
      "mappingType": 1,
      "mappingTypes": [
        {
          "name": "value to text",
          "value": 1
        },
        {
          "name": "range to text",
          "value": 2
        }
      ],
      "maxDataPoints": 100,
      "nullPointMode": "connected",
      "nullText": null,
      "postfix": "",
      "postfixFontSize": "50%",
      "prefix": "",
      "prefixFontSize": "50%",
      "rangeMaps": [
        {
          "from": "null",
          "text": "N/A",
          "to": "null"
        }
      ],
      "sparkline": {
        "fillColor": "rgba(31, 118, 189, 0.18)",
        "full": false,
        "lineColor": "rgb(31, 120, 193)",
        "show": false
      },
      "tableColumn": "",
      "targets": [
        {
          "expr": "kube_horizontalpodautoscaler_spec_min_replicas{job=\"kube-state-metrics\",  namespace=\"$namespace\"}",
          "format": "time_series",
          "intervalFactor": 2,
          "legendFormat": "",
          "refId": "A"
        }
      ],
      "thresholds": "",
      "title": "Min Replicas",
      "type": "singlestat",
      "valueFontSize": "80%",
      "valueMaps": [
        {
          "op": "=",
          "text": "0",
          "value": "null"
        }
      ],
      "valueName": "current"
    },
    {
      "cacheTimeout": null,
      "colorBackground": false,
      "colorValue": false,
      "colors": [
        "#299c46",
        "rgba(237, 129, 40, 0.89)",
        "#d44a3a"
      ],
      "datasource": "$datasource",
      "format": "none",
      "gauge": {
        "maxValue": 100,
        "minValue": 0,
        "show": false,
        "thresholdLabels": false,
        "thresholdMarkers": true
      },
      "gridPos": {
        "h": 3,
        "w": 6,
        "x": 18,
        "y": 0
      },
      "id": 8,
      "interval": null,
      "links": [],
      "mappingType": 1,
      "mappingTypes": [
        {
          "name": "value to text",
          "value": 1
        },
        {
          "name": "range to text",
          "value": 2
        }
      ],
      "maxDataPoints": 100,
      "nullPointMode": "connected",
      "nullText": null,
      "postfix": "",
      "postfixFontSize": "50%",
      "prefix": "",
      "prefixFontSize": "50%",
      "rangeMaps": [
        {
          "from": "null",
          "text": "N/A",
          "to": "null"
        }
      ],
      "sparkline": {
        "fillColor": "rgba(31, 118, 189, 0.18)",
        "full": false,
        "lineColor": "rgb(31, 120, 193)",
        "show": false
      },
      "tableColumn": "",
      "targets": [
        {
          "expr": "kube_horizontalpodautoscaler_spec_max_replicas{job=\"kube-state-metrics\"}",
          "format": "time_series",
          "intervalFactor": 2,
          "legendFormat": "",
          "refId": "A"
        }
      ],
      "thresholds": "",
      "title": "Max Replicas",
      "type": "singlestat",
      "valueFontSize": "80%",
      "valueMaps": [
        {
          "op": "=",
          "text": "0",
          "value": "null"
        }
      ],
      "valueName": "current"
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "$datasource",
      "fill": 0,
      "gridPos": {
        "h": 12,
        "w": 24,
        "x": 0,
        "y": 3
      },
      "id": 9,
      "legend": {
        "alignAsTable": false,
        "avg": false,
        "current": false,
        "max": false,
        "min": false,
        "rightSide": false,
        "show": true,
        "total": false,
        "values": false
      },
      "lines": true,
      "linewidth": 1,
      "links": [],
      "nullPointMode": "null",
      "paceLength": 10,
      "percentage": false,
      "pointradius": 5,
      "points": false,
      "renderer": "flot",
      "repeat": null,
      "seriesOverrides": [
        {
          "alias": "Max",
          "color": "#C4162A"
        },
        {
          "alias": "Min",
          "color": "#1F60C4"
        }
      ],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "expr": "kube_horizontalpodautoscaler_status_desired_replicas{job=\"kube-state-metrics\",namespace=\"$namespace\"}",
          "format": "time_series",
          "intervalFactor": 2,
          "legendFormat": "Desired",
          "refId": "B"
        },
        {
          "expr": "kube_horizontalpodautoscaler_status_current_replicas{job=\"kube-state-metrics\",namespace=\"$namespace\"}",
          "format": "time_series",
          "intervalFactor": 2,
          "legendFormat": "Running",
          "refId": "C"
        },
        {
          "expr": "kube_horizontalpodautoscaler_spec_max_replicas{job=\"kube-state-metrics\",namespace=\"$namespace\"}",
          "format": "time_series",
          "instant": false,
          "intervalFactor": 2,
          "legendFormat": "Max",
          "refId": "A"
        },
        {
          "expr": "kube_horizontalpodautoscaler_spec_min_replicas{job=\"kube-state-metrics\",namespace=\"$namespace\"}",
          "format": "time_series",
          "instant": false,
          "intervalFactor": 2,
          "legendFormat": "Min",
          "refId": "D"
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "Replicas",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    }
  ],
  "refresh": "10s",
  "schemaVersion": 18,
  "style": "dark",
  "tags": [],
  "templating": {
    "list": [
      {
        "current": {
          "text": "Prometheus",
          "value": "Prometheus"
        },
        "hide": 0,
        "includeAll": false,
        "label": null,
        "multi": false,
        "name": "datasource",
        "options": [],
        "query": "prometheus",
        "refresh": 1,
        "regex": "",
        "skipUrlSync": false,
        "type": "datasource"
      },
      {
        "allValue": null,
        "current": {},
        "datasource": "$datasource",
        "definition": "label_values(kube_horizontalpodautoscaler_metadata_generation{job=\"kube-state-metrics\"}, namespace)",
        "hide": 0,
        "includeAll": false,
        "label": "Namespace",
        "multi": false,
        "name": "namespace",
        "options": [],
        "query": "label_values(kube_horizontalpodautoscaler_metadata_generation{job=\"kube-state-metrics\"}, namespace)",
        "refresh": 2,
        "regex": "",
        "skipUrlSync": false,
        "sort": 0,
        "tagValuesQuery": "",
        "tags": [],
        "tagsQuery": "",
        "type": "query",
        "useTags": false
      },
      {
        "allValue": null,
        "current": {},
        "datasource": "$datasource",
        "definition": "label_values(kube_horizontalpodautoscaler_labels{job=\"kube-state-metrics\", namespace=\"$namespace\"}, horizontalpodautoscaler)",
        "hide": 0,
        "includeAll": false,
        "label": "Name",
        "multi": false,
        "name": "horizontalpodautoscaler",
        "options": [],
        "query": "label_values(kube_horizontalpodautoscaler_labels{job=\"kube-state-metrics\", namespace=\"$namespace\"}, horizontalpodautoscaler)",
        "refresh": 2,
        "regex": "",
        "skipUrlSync": false,
        "sort": 0,
        "tagValuesQuery": "",
        "tags": [],
        "tagsQuery": "",
        "type": "query",
        "useTags": false
      }
    ]
  },
  "time": {
    "from": "now-1h",
    "to": "now"
  },
  "timepicker": {
    "refresh_intervals": [
      "5s",
      "10s",
      "30s",
      "1m",
      "5m",
      "15m",
      "30m",
      "1h",
      "2h",
      "1d"
    ],
    "time_options": [
      "5m",
      "15m",
      "1h",
      "6h",
      "12h",
      "24h",
      "2d",
      "7d",
      "30d"
    ]
  },
  "timezone": "",
  "title": "Kubernetes / Horizontal Pod Autoscaler",
  "uid": "alJY6yWZz",
  "version": 10,
  "description": "A quick and simple dashboard for viewing how your horizontal pod autoscaler is doing."
}

해당 json의 내용을 넣고 그라파나에서 새로운 대시보드를생성해줍니다.

일단 배포후에는 이렇게 될건데요

# Run and expose php-apache server
curl -s -O https://raw.githubusercontent.com/kubernetes/website/main/content/en/examples/application/php-apache.yaml
cat php-apache.yaml | yh
kubectl apply -f php-apache.yaml

# 확인
kubectl exec -it deploy/php-apache -- cat /var/www/html/index.php
...

# 모니터링 : 터미널2개 사용
watch -d 'kubectl get hpa,pod;echo;kubectl top pod;echo;kubectl top node'
kubectl exec -it deploy/php-apache -- top

# 접속
PODIP=$(kubectl get pod -l run=php-apache -o jsonpath={.items[0].status.podIP})
curl -s $PODIP; echo

배포가 되었고,

다른 터미널 한개로는 모니터링을 합니다

파드 접속 되는것 확인했습니다.

kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
kubectl describe hpa

명령어로 새로운 HPA를 만들었습니다. HPA는 'php-apache'라는 이름의 애플리케이션이 필요에 따라 자동으로 크기를 조절하도록 설정하고, 'Min replicas'와 'Max replicas'는 애플리케이션이 최소한으로 유지해야 하는 포드 수와 최대로 늘릴 수 있는 포드 수를 지정합니다.

kubectl get hpa php-apache -o yaml | kubectl neat | yh

이 명령어는 쿠버네티스에서 'php-apache'라는 애플리케이션의 자동 확장 설정을 보여줍니다.

여기서 'maxReplicas: 10'은 최대로 만들 수 있는 복사본(포드)의 수고, 'minReplicas: 1'은 최소한 유지해야 하는 복사본의 수 입니다. 'averageUtilization: 50'은 평균 CPU 사용률 목표가 50%임을 나타냅니다.

이 설정에 따르면, CPU 사용량이 50%를 넘으면 자동으로 포드를 더 만들어 최대 10개까지 늘릴 수 있어요. 반대로 CPU 사용이 줄면, 포드의 수도 다시 줄어듭니다.

파드1으로 반복접속을 합니다.

while true;do curl -s $PODIP; sleep 0.5; done

그라파나에서 확인해보면 min replicas1 그리고, max replicas 10으로 설정되어있습니다.

그러는 와중에 계속 접속을하면서 부하를 주면

레플리카의 수가 1개에서 두개로 늘어난것을 볼 수 있습니다.

50퍼센트 이상을 한번 찍고 두개의 레플리카가 되어서 cpu사용률이 50 퍼센트 이하를 유지하고 있습니다.

# 반복 접속 2 (서비스명 도메인으로 접속) >> 증가 확인(몇개까지 증가되는가? 그 이유는?) 후 중지 >> 중지 5분 후 파드 갯수 감소 확인

kubectl run -i --tty load-generator --rm --image=busybox:1.28 --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"

금방 70퍼센트를 찍었습니다.

120퍼센트도 뚫어버립니다.

잠시 놔두니 레플리카의 갯수가 6개쯤까지 상승하다가 어느정도 안정을 찾은 것 같습니다.

이제 스크립트를 종료해줍니다.

한 5분정도 놔뒀더니 다시 스케일 다운이 되어서 1개의 레플리카가 되었습니다.

KEDA - Kubernetes based Event Driven Autoscaler

https://cumulus.tistory.com/140 참고했습니다. 감사합니다.

기존의 HPA(Horizontal Pod Autoscaler)는 리소스(CPU, Memory) 메트릭을 기반으로 스케일 여부를 결정하는 반면에 KEDA는 특정 이벤트를 기반으로 스케일 여부를 결정할 수 있습니다.

예를 들어 airflow는 metadb를 통해 현재 실행 중이거나 대기 중인 task가 얼마나 존재하는지 알 수 있습니다.

이러한 이벤트를 활용하여 worker의 scale을 결정한다면 queue에 task가 많이 추가되는 시점에 더 빠르게 확장이 가능합니다.

KEDA의 특징

KEDA는 쿠버네테스의 사용자 정의 리소스를 활용하여 이벤트에 따른 스케일링을 정의하고 관리합니다.
이벤트 소스와 메트릭을 기반으로 수평적인 파드 자동 확장(HPA)을 지원하며, 이를 통해 자동으로 파드를 스케일 아웃/스케일 인하니다.
이벤트 소스로부터 메시지나 작업의 수를 추적하고, 구성된 임계값을 기반으로 파드를 확장하거나 축소합니다.
다양한 이벤트 소스와 큐 시스템을 지원하여, AWS SQS, Azure Storage Queue, Apache Kafka 등과 같은 서비스들도 쉽게 연동할 수 있습니다.
쿠버네테스 환경에서 서버리스 워크로드를 효과적으로 관리할 수 있으며, Knative Serving과 같은 다른 오토스케일링 패러다임과 함께 사용될 수 있습니다.

KEDA with Helm : 특정 이벤트(cron 등)기반의 파드 오토 스케일링 - Chart Grafana Cron SQS_Scale

keda dashboard.json

{
  "annotations": {
    "list": [
      {
        "builtIn": 1,
        "datasource": {
          "type": "grafana",
          "uid": "-- Grafana --"
        },
        "enable": true,
        "hide": true,
        "iconColor": "rgba(0, 211, 255, 1)",
        "name": "Annotations & Alerts",
        "target": {
          "limit": 100,
          "matchAny": false,
          "tags": [],
          "type": "dashboard"
        },
        "type": "dashboard"
      }
    ]
  },
  "description": "Visualize metrics provided by KEDA",
  "editable": true,
  "fiscalYearStartMonth": 0,
  "graphTooltip": 0,
  "id": 1653,
  "links": [],
  "liveNow": false,
  "panels": [
    {
      "collapsed": false,
      "gridPos": {
        "h": 1,
        "w": 24,
        "x": 0,
        "y": 0
      },
      "id": 8,
      "panels": [],
      "title": "Metric Server",
      "type": "row"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "description": "The total number of errors encountered for all scalers.",
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 25,
            "gradientMode": "opacity",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 2,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": true,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": "Errors/sec"
        },
        "overrides": [
          {
            "matcher": {
              "id": "byName",
              "options": "http-demo"
            },
            "properties": [
              {
                "id": "color",
                "value": {
                  "fixedColor": "red",
                  "mode": "fixed"
                }
              }
            ]
          },
          {
            "matcher": {
              "id": "byName",
              "options": "scaledObject"
            },
            "properties": [
              {
                "id": "color",
                "value": {
                  "fixedColor": "red",
                  "mode": "fixed"
                }
              }
            ]
          },
          {
            "matcher": {
              "id": "byName",
              "options": "keda-system/keda-operator-metrics-apiserver"
            },
            "properties": [
              {
                "id": "color",
                "value": {
                  "fixedColor": "red",
                  "mode": "fixed"
                }
              }
            ]
          }
        ]
      },
      "gridPos": {
        "h": 9,
        "w": 8,
        "x": 0,
        "y": 1
      },
      "id": 4,
      "options": {
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "editorMode": "code",
          "expr": "sum by(job) (rate(keda_scaler_errors{}[5m]))",
          "legendFormat": "{{ job }}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Scaler Total Errors",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "description": "The number of errors that have occurred for each scaler.",
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 25,
            "gradientMode": "opacity",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 2,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": true,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": "Errors/sec"
        },
        "overrides": [
          {
            "matcher": {
              "id": "byName",
              "options": "http-demo"
            },
            "properties": [
              {
                "id": "color",
                "value": {
                  "fixedColor": "red",
                  "mode": "fixed"
                }
              }
            ]
          },
          {
            "matcher": {
              "id": "byName",
              "options": "scaler"
            },
            "properties": [
              {
                "id": "color",
                "value": {
                  "fixedColor": "red",
                  "mode": "fixed"
                }
              }
            ]
          },
          {
            "matcher": {
              "id": "byName",
              "options": "prometheusScaler"
            },
            "properties": [
              {
                "id": "color",
                "value": {
                  "fixedColor": "red",
                  "mode": "fixed"
                }
              }
            ]
          }
        ]
      },
      "gridPos": {
        "h": 9,
        "w": 8,
        "x": 8,
        "y": 1
      },
      "id": 3,
      "options": {
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "editorMode": "code",
          "expr": "sum by(scaler) (rate(keda_scaler_errors{exported_namespace=~\"$namespace\", scaledObject=~\"$scaledObject\", scaler=~\"$scaler\"}[5m]))",
          "legendFormat": "{{ scaler }}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Scaler Errors",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "description": "The number of errors that have occurred for each scaled object.",
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 25,
            "gradientMode": "opacity",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 2,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": true,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": "Errors/sec"
        },
        "overrides": [
          {
            "matcher": {
              "id": "byName",
              "options": "http-demo"
            },
            "properties": [
              {
                "id": "color",
                "value": {
                  "fixedColor": "red",
                  "mode": "fixed"
                }
              }
            ]
          }
        ]
      },
      "gridPos": {
        "h": 9,
        "w": 8,
        "x": 16,
        "y": 1
      },
      "id": 2,
      "options": {
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "editorMode": "code",
          "expr": "sum by(scaledObject) (rate(keda_scaled_object_errors{exported_namespace=~\"$namespace\", scaledObject=~\"$scaledObject\"}[5m]))",
          "legendFormat": "{{ scaledObject }}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Scaled Object Errors",
      "type": "timeseries"
    },
    {
      "collapsed": false,
      "gridPos": {
        "h": 1,
        "w": 24,
        "x": 0,
        "y": 10
      },
      "id": 10,
      "panels": [],
      "title": "Scale Target",
      "type": "row"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "description": "The current value for each scaler’s metric that would be used by the HPA in computing the target average.",
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 25,
            "gradientMode": "opacity",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 2,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": true,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": "none"
        },
        "overrides": [
          {
            "matcher": {
              "id": "byName",
              "options": "http-demo"
            },
            "properties": [
              {
                "id": "color",
                "value": {
                  "fixedColor": "blue",
                  "mode": "fixed"
                }
              }
            ]
          }
        ]
      },
      "gridPos": {
        "h": 9,
        "w": 24,
        "x": 0,
        "y": 11
      },
      "id": 5,
      "options": {
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "editorMode": "code",
          "expr": "sum by(metric) (keda_scaler_metrics_value{exported_namespace=~\"$namespace\", metric=~\"$metric\", scaledObject=\"$scaledObject\"})",
          "legendFormat": "{{ metric }}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Scaler Metric Value",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "description": "shows current replicas against max ones based on time difference",
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 21,
            "gradientMode": "opacity",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineStyle": {
              "fill": "solid"
            },
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "auto",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              }
            ]
          },
          "unit": "short"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 24,
        "x": 0,
        "y": 20
      },
      "id": 13,
      "options": {
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "editorMode": "code",
          "exemplar": false,
          "expr": "kube_horizontalpodautoscaler_status_current_replicas{namespace=\"$namespace\",horizontalpodautoscaler=\"keda-hpa-$scaledObject\"}",
          "format": "time_series",
          "instant": false,
          "interval": "",
          "legendFormat": "current_replicas",
          "range": true,
          "refId": "A"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "editorMode": "code",
          "exemplar": false,
          "expr": "kube_horizontalpodautoscaler_spec_max_replicas{namespace=\"$namespace\",horizontalpodautoscaler=\"keda-hpa-$scaledObject\"}",
          "format": "time_series",
          "hide": false,
          "instant": false,
          "legendFormat": "max_replicas",
          "range": true,
          "refId": "B"
        }
      ],
      "title": "Current/max replicas (time based)",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "description": "shows current replicas against max ones based on time difference",
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "continuous-GrYlRd"
          },
          "custom": {
            "fillOpacity": 70,
            "lineWidth": 0,
            "spanNulls": false
          },
          "mappings": [
            {
              "options": {
                "0": {
                  "color": "green",
                  "index": 0,
                  "text": "No scaling"
                }
              },
              "type": "value"
            },
            {
              "options": {
                "from": -200,
                "result": {
                  "color": "light-red",
                  "index": 1,
                  "text": "Scaling down"
                },
                "to": 0
              },
              "type": "range"
            },
            {
              "options": {
                "from": 0,
                "result": {
                  "color": "semi-dark-red",
                  "index": 2,
                  "text": "Scaling up"
                },
                "to": 200
              },
              "type": "range"
            }
          ],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              }
            ]
          },
          "unit": "none"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 24,
        "x": 0,
        "y": 28
      },
      "id": 16,
      "options": {
        "alignValue": "left",
        "legend": {
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": false,
          "width": 0
        },
        "mergeValues": true,
        "rowHeight": 1,
        "showValue": "never",
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "editorMode": "code",
          "exemplar": false,
          "expr": "delta(kube_horizontalpodautoscaler_status_current_replicas{namespace=\"$namespace\",horizontalpodautoscaler=\"keda-hpa-$scaledObject\"}[1m])",
          "format": "time_series",
          "instant": false,
          "interval": "",
          "legendFormat": ".",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Changes in replicas",
      "type": "state-timeline"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${datasource}"
      },
      "description": "shows current replicas against max ones",
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "thresholds"
          },
          "mappings": [],
          "min": 0,
          "thresholds": {
            "mode": "percentage",
            "steps": [
              {
                "color": "green"
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": "short"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 0,
        "y": 36
      },
      "id": 15,
      "options": {
        "orientation": "auto",
        "reduceOptions": {
          "calcs": [
            "lastNotNull"
          ],
          "fields": "/^current_replicas$/",
          "values": false
        },
        "showThresholdLabels": false,
        "showThresholdMarkers": true
      },
      "pluginVersion": "9.5.2",
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "editorMode": "code",
          "exemplar": false,
          "expr": "kube_horizontalpodautoscaler_status_current_replicas{namespace=\"$namespace\",horizontalpodautoscaler=\"keda-hpa-$scaledObject\"}",
          "instant": true,
          "legendFormat": "current_replicas",
          "range": false,
          "refId": "A"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${datasource}"
          },
          "editorMode": "code",
          "exemplar": false,
          "expr": "kube_horizontalpodautoscaler_spec_max_replicas{namespace=\"$namespace\",horizontalpodautoscaler=\"keda-hpa-$scaledObject\"}",
          "hide": false,
          "instant": true,
          "legendFormat": "max_replicas",
          "range": false,
          "refId": "B"
        }
      ],
      "title": "Current/max replicas",
      "type": "gauge"
    }
  ],
  "refresh": "1m",
  "schemaVersion": 38,
  "style": "dark",
  "tags": [],
  "templating": {
    "list": [
      {
        "current": {
          "selected": false,
          "text": "Prometheus",
          "value": "Prometheus"
        },
        "hide": 0,
        "includeAll": false,
        "multi": false,
        "name": "datasource",
        "options": [],
        "query": "prometheus",
        "queryValue": "",
        "refresh": 1,
        "regex": "",
        "skipUrlSync": false,
        "type": "datasource"
      },
      {
        "current": {
          "selected": false,
          "text": "bhe-test",
          "value": "bhe-test"
        },
        "datasource": {
          "type": "prometheus",
          "uid": "${datasource}"
        },
        "definition": "label_values(keda_scaler_active,exported_namespace)",
        "hide": 0,
        "includeAll": false,
        "multi": false,
        "name": "namespace",
        "options": [],
        "query": {
          "query": "label_values(keda_scaler_active,exported_namespace)",
          "refId": "PrometheusVariableQueryEditor-VariableQuery"
        },
        "refresh": 1,
        "regex": "",
        "skipUrlSync": false,
        "sort": 1,
        "type": "query"
      },
      {
        "current": {
          "selected": false,
          "text": "All",
          "value": "$__all"
        },
        "datasource": {
          "type": "prometheus",
          "uid": "${datasource}"
        },
        "definition": "label_values(keda_scaler_active{exported_namespace=\"$namespace\"},scaledObject)",
        "hide": 0,
        "includeAll": true,
        "multi": true,
        "name": "scaledObject",
        "options": [],
        "query": {
          "query": "label_values(keda_scaler_active{exported_namespace=\"$namespace\"},scaledObject)",
          "refId": "PrometheusVariableQueryEditor-VariableQuery"
        },
        "refresh": 2,
        "regex": "",
        "skipUrlSync": false,
        "sort": 0,
        "type": "query"
      },
      {
        "current": {
          "selected": false,
          "text": "cronScaler",
          "value": "cronScaler"
        },
        "datasource": {
          "type": "prometheus",
          "uid": "${datasource}"
        },
        "definition": "label_values(keda_scaler_active{exported_namespace=\"$namespace\"},scaler)",
        "hide": 0,
        "includeAll": false,
        "multi": false,
        "name": "scaler",
        "options": [],
        "query": {
          "query": "label_values(keda_scaler_active{exported_namespace=\"$namespace\"},scaler)",
          "refId": "PrometheusVariableQueryEditor-VariableQuery"
        },
        "refresh": 2,
        "regex": "",
        "skipUrlSync": false,
        "sort": 0,
        "type": "query"
      },
      {
        "current": {
          "selected": false,
          "text": "s0-cron-Etc-UTC-40xxxx-55xxxx",
          "value": "s0-cron-Etc-UTC-40xxxx-55xxxx"
        },
        "datasource": {
          "type": "prometheus",
          "uid": "${datasource}"
        },
        "definition": "label_values(keda_scaler_active{exported_namespace=\"$namespace\"},metric)",
        "hide": 0,
        "includeAll": false,
        "multi": false,
        "name": "metric",
        "options": [],
        "query": {
          "query": "label_values(keda_scaler_active{exported_namespace=\"$namespace\"},metric)",
          "refId": "PrometheusVariableQueryEditor-VariableQuery"
        },
        "refresh": 2,
        "regex": "",
        "skipUrlSync": false,
        "sort": 0,
        "type": "query"
      }
    ]
  },
  "time": {
    "from": "now-24h",
    "to": "now"
  },
  "timepicker": {},
  "timezone": "",
  "title": "KEDA",
  "uid": "asdasd8rvmMxdVk",
  "version": 8,
  "weekStart": ""
}

해당 Json을 사용하여 그라파나 대시보드를 Import 해줍니다.

KEDA 설치

# KEDA 설치
cat <<EOT > keda-values.yaml
metricsServer:
  useHostNetwork: true

prometheus:
  metricServer:
    enabled: true
    port: 9022
    portName: metrics
    path: /metrics
    serviceMonitor:
      # Enables ServiceMonitor creation for the Prometheus Operator
      enabled: true
    podMonitor:
      # Enables PodMonitor creation for the Prometheus Operator
      enabled: true
  operator:
    enabled: true
    port: 8080
    serviceMonitor:
      # Enables ServiceMonitor creation for the Prometheus Operator
      enabled: true
    podMonitor:
      # Enables PodMonitor creation for the Prometheus Operator
      enabled: true

  webhooks:
    enabled: true
    port: 8080
    serviceMonitor:
      # Enables ServiceMonitor creation for the Prometheus webhooks
      enabled: true
EOT

YAML 파일 생성 (keda-values.yaml):

KEDA 설치를 위한 설정값이 정의되어 있습니다. 여기서는 metrics server, prometheus 설정을 포함하여 KEDA를 위한 다양한 구성 옵션을 지정했습니다. 예를 들어, metricsServer.useHostNetwork: true는 메트릭 서버가 호스트 네트워크를 사용하도록 설정합니다.

kubectl create namespace keda
helm repo add kedacore https://kedacore.github.io/charts
helm install keda kedacore/keda --version 2.13.0 --namespace keda -f keda-values.yaml

네임스페이스 생성 (kubectl create namespace keda):

keda라는 네임스페이스를 생성하여, KEDA와 관련된 모든 리소스가 이 네임스페이스 안에서 관리되도록 합니다.

Helm 리포지토리 추가 (helm repo add kedacore https://kedacore.github.io/charts)

KEDA 차트를 포함하는 Helm 리포지토리를 추가합니다.

KEDA 설치 (helm install keda kedacore/keda --version 2.13.0 --namespace keda -f keda-values.yaml)

이 명령어는 앞서 추가한 kedacore 리포지토리에서 KEDA 차트를 사용하여 keda 네임스페이스에 KEDA를 설치합니다. 여기서 -f keda-values.yaml 옵션은 KEDA 설치에 keda-values.yaml 파일의 구성을 사용하도록 지정합니다.

화려하게 KEDA문구가 떠주네요 이제는 설치되었는지 확인해보겠습니다.

설치가 완료된 후에는 KEDA를 사용하여 스케일링 규칙을 정의하는 ScaledObjects를 배포하여 애플리케이션의 자동 스케일링을 시작할 수 있습니다.

ScaledObjects는 특정 이벤트(예: 메시지 큐에 메시지가 도착함)에 따라 팟(Pods)을 자동으로 스케일링하도록 KEDA에 지시합니다.

# KEDA 설치 확인
kubectl get all -n keda
kubectl get validatingwebhookconfigurations keda-admission
kubectl get validatingwebhookconfigurations keda-admission | kubectl neat | yh
kubectl get crd | grep keda

# keda 네임스페이스에 디플로이먼트 생성
kubectl apply -f php-apache.yaml -n keda
kubectl get pod -n keda

# ScaledObject 정책 생성 : cron
cat <<EOT > keda-cron.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: php-apache-cron-scaled
spec:
  minReplicaCount: 0
  maxReplicaCount: 2
  pollingInterval: 30
  cooldownPeriod: 300
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: php-apache
  triggers:
  - type: cron
    metadata:
      timezone: Asia/Seoul
      start: 00,15,30,45 * * * *
      end: 05,20,35,50 * * * *
      desiredReplicas: "1"
EOT
kubectl apply -f keda-cron.yaml -n keda

애플리케이션 상태 확인

kubectl get pod -n keda 명령어를 사용하여 keda 네임스페이스 내의 모든 포드의 상태를 확인했습니다. 여기에는 KEDA 관련 포드뿐만 아니라, 방금 배포한 php-apache 애플리케이션 포드도 포함됩니다.

이벤트 기반 자동 스케일링 설정 (keda-cron.yaml)

keda-cron.yaml 파일을 통해 php-apache Deployment에 대한 ScaledObject를 생성하여 이벤트 기반 자동 스케일링을 설정했습니다.

이 설정은 cron 트리거를 사용하여 특정 시간에 따라 포드의 수를 자동으로 조절하도록 합니다. 여기서는 서울 시간대를 기준으로 매 15분 시작하여 5분간 1개의 레플리카를 유지하도록 설정했습니다.

제일 밑에 생성된 파드입니다.

# 그라파나 대시보드 추가
# 모니터링
watch -d 'kubectl get ScaledObject,hpa,pod -n keda'
kubectl get ScaledObject -w

# 확인
kubectl get ScaledObject,hpa,pod -n keda
kubectl get hpa -o jsonpath={.items[0].spec} -n keda | jq

# KEDA 및 deployment 등 삭제
kubectl delete -f keda-cron.yaml -n keda && kubectl delete deploy php-apache -n keda && helm uninstall keda -n keda
kubectl delete namespace keda

Grafana에서 KEDA dashboard를 확인하면 다음과 같은 메트릭을 확인하실 수 있습니다. 생성과 죽음을 반복하는 크론이 대시보드로 보니 확실히 보이네요

php-apache-598b474864-9td2b 파드에 대해서 없어졌다가 생긴것 확인가능합니다.

클린업

# KEDA 및 deployment 등 삭제
kubectl delete -f keda-cron.yaml -n keda && kubectl delete deploy php-apache -n keda && helm uninstall keda -n keda
kubectl delete namespace keda

VPA (Vertical Pod Autoscaler)

https://malwareanalysis.tistory.com/603 악분님의 블로그를 참고합니다.

수직적 파드 오토스케일링 기능입니다.

현재의 Pod 사용 메트릭에 따라 Pod 의 적정 리소스 크기를 확장 축소 할 수 있습니다.

pod resources.request을 최대한 최적값으로 수정합니다.

수정된 request값이 기존 값보다 위 또는 아래 범위에 속하므로 Vertical라고 표현합니다.

VPA를 설정하면 쿠버네티스 노드 cpu와 메모리를 최대한 확보할 수 있으므로 자원 효율성이 증가합니다.

[Scale Up - Down 기능]

HPA 와 함께 사용할 수 없으며,

VPA 작동 시 Pod 를 새로 배포하고 기존 Pod 는 삭제하는 식으로 동작합니다. (updateMode 를 Off로 수정하여 기능을 끌 수 있습니다.)

아키텍처는 다음과 같습니다.

VPA가 실행되고 있는 pod의 resources.request를 최적값으로 수정하는 로직

VPA는 Admission controller을 사용하여 pod resources.request를 수정합니다.

기존 pod를 종료시키면 쿠버네티스 controller(예: deployment controller)가 pod를 다시 생성합니다.

이 때, VPA Admission controller가 mutant webhook으로 pod request를 최적값으로 수정합니다.

최적값을 계산하는 로직

Pod의 최근 자원 사용량을 기준으로 삼고 거기에 약간의 여유값(마진)을 추가하여 최적의 자원 요청 값을 계산합니다.

이 과정에서 Metrics Server가 Pod의 자원 사용량을 파악하는 데 사용되는데, 이 정보를 바탕으로 Horizontal Pod Autoscaler(HPA)나 Vertical Pod Autoscaler(VPA)가 얼마나 많은 CPU와 메모리가 각 Pod에 필요한지 결정합니다

VPA CRD

VPA CRD는 VPA를 적용할 pod와 계산 로직 등 을 설정합니다.

14588번을 사용해서 그라파나 대시보드를 생성합니다.

# 코드 다운로드
git clone https://github.com/kubernetes/autoscaler.git
cd ~/autoscaler/vertical-pod-autoscaler/
tree hack

# openssl 버전 확인
openssl version
OpenSSL 1.0.2k-fips  26 Jan 2017

# openssl 1.1.1 이상 버전 확인
yum install openssl11 -y
openssl11 version
OpenSSL 1.1.1g FIPS  21 Apr 2020

# 스크립트파일내에 openssl11 수정
sed -i 's/openssl/openssl11/g' ~/autoscaler/vertical-pod-autoscaler/pkg/admission-controller/gencerts.sh

autoscaler GitHub 레포지토리 클론

git clone https://github.com/kubernetes/autoscaler.git 명령어로 Kubernetes Autoscaler 프로젝트의 코드를 로컬 시스템에 복사했습니다. 이 레포지토리에는 VPA와 관련된 코드와 설치 스크립트가 포함되어 있습니다.

OpenSSL 업그레이드

시스템에 설치된 OpenSSL 버전을 확인하고(openssl version), 더 최신 버전의 OpenSSL(openssl11)을 설치했습니다. VPA 설치 스크립트가 최신 버전의 OpenSSL을 요구하기 때문에 이 단계가 필요했습니다.

설치 스크립트 수정

sed 명령어를 사용해 VPA 설치 스크립트에서 OpenSSL 명령어를 새로 설치된 openssl11로 변경했습니다. 이는 스크립트가 올바른 버전의 OpenSSL을 사용하도록 보장합니다.

# Deploy the Vertical Pod Autoscaler to your cluster with the following command.
watch -d kubectl get pod -n kube-system
cat hack/vpa-up.sh
./hack/vpa-up.sh
kubectl get crd | grep autoscaling
kubectl get mutatingwebhookconfigurations vpa-webhook-config
kubectl get mutatingwebhookconfigurations vpa-webhook-config -o json | jq

제일 밑에 세개 vpa 시리즈들 입니다.

VPA 설치 실행

./hack/vpa-up.sh 스크립트를 실행하여 VPA 관련 컴포넌트를 Kubernetes 클러스터에 배포했습니다. 이 스크립트는 필요한 모든 Custom Resource Definitions(CRDs), 클러스터 역할, 역할 바인딩, 서비스 계정 등을 생성하고, VPA 관련 Deployment와 Service를 설정합니다.

인증서 생성 및 업로드:

VPA 웹훅에 대한 TLS 인증서를 생성하고 Kubernetes 클러스터의 시크릿으로 업로드했습니다. 이 인증서는 VPA 웹훅이 안전하게 통신할 수 있도록 합니다.

VPA가 성공적으로 설치되어 이제 클러스터에서 실행 중인 애플리케이션의 리소스 사용량을 모니터링하고, 필요에 따라 포드의 CPU 및 메모리 할당량을 자동으로 조정할 수 있게 됩니다. VPA를 사용함으로써 애플리케이션의 성능과 안정성을 향상시키는 동시에 리소스 낭비를 줄일 수 있습니다.

부하 테스트를 위한 Pod 생성

---
apiVersion: "autoscaling.k8s.io/v1"
kind: VerticalPodAutoscaler
metadata:
  name: hamster-vpa
spec:
  # recommenders
  #   - name 'alternative'
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: hamster
  resourcePolicy:
    containerPolicies:
      - containerName: '*'
        minAllowed:
          cpu: 100m
          memory: 50Mi
        maxAllowed:
          cpu: 1
          memory: 500Mi
        controlledResources: ["cpu", "memory"]
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: hamster
spec:
  selector:
    matchLabels:
      app: hamster
  replicas: 2
  template:
    metadata:
      labels:
        app: hamster
    spec:
      securityContext:
        runAsNonRoot: true
        runAsUser: 65534 # nobody
      containers:
        - name: hamster
          image: registry.k8s.io/ubuntu-slim:0.1
          resources:
            requests:
              cpu: 100m
              memory: 50Mi
          command: ["/bin/sh"]
          args:
            - "-c"
            - "while true; do timeout 0.5s yes >/dev/null; sleep 0.5s; done"

다음 yaml파일을 배포해줍니다.

VerticalPodAutoscaler (VPA) 리소스: hamster-vpa라는 이름의 VPA를 정의합니다. 이 VPA는 hamster Deployment의 각 컨테이너에 대해 CPU와 메모리 리소스의 최소 및 최대 허용치를 설정합니다. 여기서는 CPU의 경우 최소 100m(밀리코어), 최대 1코어로, 메모리는 최소 50Mi, 최대 500Mi로 설정됩니다.

Deployment 리소스: hamster라는 이름의 Deployment를 정의하며, 이 Deployment는 registry.k8s.io/ubuntu-slim:0.1 이미지를 사용하는 단일 컨테이너로 구성됩니다. 컨테이너는 CPU와 메모리 리소스 요청을 각각 100m과 50Mi로 시작하며, while 루프를 사용하여 CPU를 소비하는 간단한 작업을 반복적으로 실행합니다.

kubectl apply -f examples/hamster.yaml && kubectl get vpa -w

Vertical Pod Autoscaler (VPA)를 사용하는 간단한 예제를 배포했습니다. hamster라는 이름의 Deployment와 이 Deployment를 대상으로 하는 VPA를 생성합니다. VPA는 파드의 리소스 사용량을 모니터링하고, 필요에 따라 파드의 CPU 및 메모리 요청량을 자동으로 조정하여 애플리케이션의 성능을 최적화합니다

터미널 모니터링을 보면 request, limit 값보다 큰 리소스가 요구가 기존 Pod 리소스 할당량을 조정하기 위해 해당 Pod를 재시작하면서 새로운 리소스 할당량을 적용합니다. 이 과정에서 기존 Pod는 종료되고, 새로운 리소스 할당량을 가진 Pod가 생성됩니다.

그라파나 대시보드에서도 cpu가 과부하 걸리고, 새로운 파드가 만들어지는것 확인 가능합니다.

그라파나에서도 확인이 가능합니다.

Vertical Pod Autoscaler (VPA)는 포드의 리소스(requests와 limits) 할당량을 자동으로 조정함으로써 애플리케이션의 성능을 최적화합니다.

모니터링: VPA는 지정된 포드의 CPU와 메모리 사용량을 지속적으로 모니터링합니다. 이 데이터는 포드에 할당된 리소스가 적절한지 판단하는 데 사용됩니다.

리소스 할당량 조정 제안: VPA는 모니터링된 사용량 데이터를 바탕으로, 포드의 리소스 할당량(requests와 limits)을 조정할 필요가 있는지 결정합니다. 이때, 설정된 최소 및 최대 범위 내에서 적절한 값을 제안합니다.

리소스 할당량 적용: 포드의 현재 리소스 할당량을 조정해야 할 경우, VPA는 해당 포드를 재시작하면서 새로운 리소스 할당량을 적용합니다. 기존에 실행 중이던 포드는 종료되고, 새로운 설정을 가진 새 포드가 생성됩니다. 이 과정은 무중단 서비스를 제공하는 데 중요한 Deployment와 같은 컨트롤러를 통해 관리됩니다.

클린업

kubectl delete -f examples/hamster.yaml && cd ~/autoscaler/vertical-pod-autoscaler/ && ./hack/vpa-down.sh

CA(Cluster Autoscaler)

Cluster Autoscaler(CA)는 Kubernetes 클러스터에서 작업 부하와 리소스 요구 사항에 따라 Pod 가 배포될 Node 의 리소스가 부족해질때 노드(워커 노드)의 수를 자동으로 조절하는 도구입니다.

CA 는 Pending 상태의 Pods 가 존재할 경우 Node 를 Scale Out 하며 AWS 에서는 AutoScaling Group 을 사용합니다.

Cluster Autoscaler의 주요 작동 방식

Scale Out (확장)

Pending 상태의 Pods: CA는 클러스터 내에 새로운 포드가 스케줄링되기 위해 대기 중이지만, 충분한 리소스가 없어 배치되지 못하는 경우를 감지합니다.

노드 추가: 필요한 리소스를 제공하기 위해 새로운 노드(워커 노드)를 자동으로 추가합니다.

Scale In (축소)

과도한 리소스: 사용률이 낮은 노드를 감지하여, 리소스를 효율적으로 사용하지 않는 노드를 제거합니다.

노드 제거: 불필요한 노드를 제거함으로써 비용을 절약하고 효율성을 높입니다.

Cluster Autoscaler의 제약 조건 및 모범 사례

오토 스케일링 전략: 주로 EC2의 Auto Scaling Group을 사용하여 노드 그룹의 스케일링을 관리합니다.

인스턴스 타입 일관성: 노드 그룹 내의 인스턴스 타입은 일반적으로 동일하다고 가정합니다.

혼합 인스턴스 타입: 사용 가능한 경우, CPU와 메모리가 균등한 혼합 인스턴스 타입을 사용합니다.

다양한 인스턴스 타입 지원: 여러 인스턴스 타입을 지원하기 위해 여러 노드 그룹을 사용합니다.

가용 영역(AZ)별 노드 그룹: 각 가용 영역마다 별도의 노드 그룹을 유지하는 것이 모범 사례입니다.

구현

Cluster Autoscaler를 클러스터에 구현하기 위해서는, 먼저 cluster-autoscaler 애플리케이션을 클러스터 내에 배포합니다. 이 애플리케이션은 다음과 같은 역할을 수행합니다

모니터링: 클러스터의 리소스 사용량과 포드의 상태를 지속적으로 모니터링합니다.

자동 조정: 필요에 따라 자동으로 노드를 추가하거나 제거하여, 클러스터의 리소스를 최적화합니다

CA 실습

aws ec2 describe-instances  --filters Name=tag:Name,Values=$CLUSTER_NAME-ng1-Node --query "Reservations[*].Instances[*].Tags[*]" --output yaml | yh

cluster-autoscaler/enabled key값의 true로 설정되어있습니다.

AWS EC2 콘솔에서 확인할 수 있는 워커노드의 태그를 보면 k8s.io/cluster-autoscaler/enabled에 대해 동일하게 true

k8s.io/cluster-autoscaler/enabled: 이 태그와 true 값은 해당 EC2 인스턴스(워커 노드)가 Cluster Autoscaler에 의해 관리될 수 있음을 나타냅니다. 즉, 클러스터의 리소스 요구 사항에 따라 자동으로 스케일 인 또는 스케일 아웃할 수 있는 노드라는 것을 의미합니다.

k8s.io/cluster-autoscaler/myeks에 대해서 owned 확인할 수 있습니다.

k8s.io/cluster-autoscaler/<cluster-name>: 이 태그와 owned 값은 특정 EC2 인스턴스가 특정 Kubernetes 클러스터(myeks 클러스터)의 일부이며, Cluster Autoscaler에 의해 관리되는 것을 나타냅니다.

ASG 설정

# 현재 autoscaling(ASG) 정보 확인
# aws autoscaling describe-auto-scaling-groups --query "AutoScalingGroups[? Tags[? (Key=='eks:cluster-name') && Value=='클러스터이름']].[AutoScalingGroupName, MinSize, MaxSize,DesiredCapacity]" --output table
aws autoscaling describe-auto-scaling-groups \
    --query "AutoScalingGroups[? Tags[? (Key=='eks:cluster-name') && Value=='myeks']].[AutoScalingGroupName, MinSize, MaxSize,DesiredCapacity]" \
    --output table


# MaxSize 6개로 수정
export ASG_NAME=$(aws autoscaling describe-auto-scaling-groups --query "AutoScalingGroups[? Tags[? (Key=='eks:cluster-name') && Value=='myeks']].AutoScalingGroupName" --output text)
aws autoscaling update-auto-scaling-group --auto-scaling-group-name ${ASG_NAME} --min-size 3 --desired-capacity 3 --max-size 6

# 확인
aws autoscaling describe-auto-scaling-groups --query "AutoScalingGroups[? Tags[? (Key=='eks:cluster-name') && Value=='myeks']].[AutoScalingGroupName, MinSize, MaxSize,DesiredCapacity]" --output table

ASG의 최대 사이즈를 3에서 6으로 조정한 후 확인해줍니다.

# 배포 : Deploy the Cluster Autoscaler (CA)
curl -s -O https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml
sed -i "s/<YOUR CLUSTER NAME>/$CLUSTER_NAME/g" cluster-autoscaler-autodiscover.yaml
kubectl apply -f cluster-autoscaler-autodiscover.yaml

# 확인
kubectl get pod -n kube-system | grep cluster-autoscaler
kubectl describe deployments.apps -n kube-system cluster-autoscaler
kubectl describe deployments.apps -n kube-system cluster-autoscaler | grep node-group-auto-discovery
      --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/myeks

# (옵션) cluster-autoscaler 파드가 동작하는 워커 노드가 퇴출(evict) 되지 않게 설정
kubectl -n kube-system annotate deployment.apps/cluster-autoscaler cluster-autoscaler.kubernetes.io/safe-to-evict="false"

생성되었습니다.

cluster-autoscaler.kubernetes.io/safe-to-evict가 false annotation을 설정하여 cluster-autoscaler 파드가 동작하는 Worker Node는 퇴출(evict) 되지 않게 설정했습니다.

이 영향으로 Cluster가 스케일 다운 과정에서 리소스를 절약하기 위해 Pod를 재배치 혹은 삭제할 때, 해당 Deployment의 Pod는 영향 받지 않습니다.

부하 테스트

cat <<EoF> nginx.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-to-scaleout
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        service: nginx
        app: nginx
    spec:
      containers:
      - image: nginx
        name: nginx-to-scaleout
        resources:
          limits:
            cpu: 500m
            memory: 512Mi
          requests:
            cpu: 500m
            memory: 512Mi
EoF

kubectl apply -f nginx.yaml
kubectl get deployment/nginx-to-scaleout

Deployment 생성

nginx-to-scaleout Deployment가 생성되어, 단일 포드 인스턴스가 배포됩니다.

kubectl scale --replicas=15 deployment/nginx-to-scaleout && date

스케일링 명령 실행: kubectl scale 명령어를 사용하여 Deployment의 포드 수를 15개로 확장합니다.

nginx-to-scaleout Deployment의 포드 수를 1개에서 15개로 확장하는 과정에서 Cluster Autoscaler(CA)와 관련된 몇 가지 중요한 동작을 살펴볼 수 있습니다.

CA는 Kubernetes 클러스터에서 노드(워커 노드)의 수를 자동으로 조절하여 애플리케이션의 요구사항과 리소스 사용량에 맞춰 최적화하는 역할을 합니다.

CA의 동작 원리와 관련된 핵심 포인트

포드 스케줄링과 리소스 요구사항

Deployment의 포드 수를 증가시키면, 새로운 포드가 클러스터 내에 스케줄링되어야 합니다. 각 포드는 특정 양의 CPU와 메모리 리소스를 요구합니다. 만약 현재 클러스터의 노드들이 추가 포드를 수용할 만큼의 여유 리소스가 없다면, 이 포드들은 Pending 상태로 남게 됩니다.

CA에 의한 Scale-Out (확장) 동작

Pending 상태의 포드가 존재하고, 이를 수용할 수 있는 리소스가 부족한 경우, CA는 자동으로 새로운 노드를 클러스터에 추가하여 포드가 스케줄링될 수 있도록 합니다. AWS 환경에서는 이 과정이 Auto Scaling Group을 통해 이루어집니다. 새 노드가 추가되면, Pending 상태의 포드는 새로운 노드 위에 스케줄링되어 Running 상태로 전환됩니다.

자원 활용과 Scale-In (축소) 동작

포드의 수가 감소하거나 리소스 사용량이 줄어들어 클러스터 내 일부 노드가 충분히 활용되지 않는 경우, CA는 리소스 활용을 최적화하기 위해 이러한 노드들을 클러스터에서 제거(Scale-In)할 수 있습니다.

위 로그에서, 포드 수를 15개로 확장한 후 일정 시간이 지나서 모든 포드가 Ready 상태가 되었습니다. 이는 CA가 클러스터 내 리소스 요구사항에 맞춰 자동으로 노드를 추가하여 모든 포드가 정상적으로 스케줄링되고 실행될 수 있도록 했음을 의미할 수 있습니다. 클러스터의 확장(Scaling-Out) 과정은 CA가 클러스터의 상태를 모니터링하고, 필요에 따라 적절한 시점에 자동으로 노드를 추가하거나 제거함으로써 리소스 사용의 효율성을 보장하는 방식으로 이루어집니다.

CA의 몇 가지 한계와 문제점

Auto Scaling Group(ASG) 의존성

CA는 ASG에 의존하여 노드를 관리합니다. 따라서 CA 자체는 직접적으로 노드를 생성하거나 삭제하는 작업에 관여하지 않습니다.

EC2 인스턴스 삭제 불일치

EKS에서 노드를 삭제해도, AWS EC2에서 해당 인스턴스가 자동으로 삭제되지 않는 경우가 있습니다. 이는 리소스 정리 과정에서 혼란을 줄 수 있습니다.

노드 축소 시 어려움

CA가 노드를 축소할 때, Pod가 적은 노드나 이미 드레인(Drained)된 노드부터 선택하는 기준이 명확하지 않아, 특정 노드를 축소하기 어렵습니다.

스케일링 속도

CA의 반응 속도가 느리며, 실시간 트래픽 변동에 빠르게 대응하기 어렵습니다. 특히, 높은 수요가 발생했을 때 포드가 Pending 상태로 오랫동안 남아 있을 수 있습니다.

API 제한

CA는 폴링 방식으로 작동하기 때문에, 자주 실행하면 AWS API 제한에 도달할 수 있습니다.

Karpenter: CA의 대안

위와 같은 문제점을 해결하기 위해, AWS에서는 Karpenter라는 새로운 오토스케일러를 개발했습니다. Karpenter는 Kubernetes 클러스터의 노드 관리를 보다 유연하고 효율적으로 만들어줍니다.

직접적인 노드 관리

Karpenter는 Kubernetes API를 직접 활용하여 필요에 따라 EC2 인스턴스를 자동으로 생성하고 삭제합니다.

빠른 스케일링

Karpenter는 Pod의 스케줄링 요구사항을 기반으로 노드를 빠르게 스케일링할 수 있어, Pending 상태의 포드를 최소화합니다.

다양한 인스턴스 타입과 크기 자동 선택

Karpenter는 애플리케이션의 요구사항과 최적의 비용 효율성을 고려하여 다양한 인스턴스 타입과 크기를 자동으로 선택합니다.

API 제한 우려 감소

Karpenter는 폴링 방식이 아닌 이벤트 기반으로 작동하여 AWS API 제한에 도달할 가능성을 줄여줍니다.

CPA (Cluster Proportional Autoscaler)

Cluster Proportional Autoscaler (CPA)는 Kubernetes 클러스터의 크기나 리소스 사용량에 비례하여 애플리케이션의 포드 수를 자동으로 조절하는 도구입니다.

CPA의 목적은 클러스터의 전반적인 리소스 사용 패턴과 요구 사항에 맞춰 포드의 스케일링을 최적화하는 것입니다.

CPA의 작동 원리

CPA는 주로 크기나 리소스 사용량(예: 총 CPU, 메모리) 같은 클러스터의 지표들을 기반으로 포드의 수를 조절합니다. 예를 들어, 클러스터의 노드 수가 증가하면 CPA는 특정 애플리케이션의 포드 수를 자동으로 늘려서 클러스터의 용량을 보다 효율적으로 사용할 수 있도록 합니다. 반대로, 노드 수가 감소하면 포드의 수도 적절히 줄입니다.

CPA 사용 사례

CPA는 다음과 같은 경우에 특히 유용합니다

- 클러스터에 따라 변화하는 백엔드 서비스의 수요: 클러스터의 크기에 따라 백엔드 서비스의 로드가 변할 때, CPA를 사용하여 자동으로 백엔드 포드의 수를 조절할 수 있습니다.

- 중앙 집중식 로깅 또는 모니터링: 클러스터의 노드 수에 비례하여 로그 수집기나 모니터링 에이전트의 인스턴스 수를 조정합니다.

- DNS 서버나 인증 서비스 같은 인프라 컴포넌트: 클러스터가 성장하면 이러한 인프라 서비스에 대한 요청도 증가하기 때문에, CPA를 사용하여 자동으로 스케일링할 수 있습니다.

CPA 설정 방법

CPA를 설정할 때는 주로 두 가지 방식을 사용합니다:

1. 리니어 모드: 클러스터의 노드 수에 비례하여 포드 수를 조절합니다. 예를 들어, 클러스터의 노드 수에 따라 포드의 수를 직접적으로 조절하는 방식입니다.

2. Ladder 모드: 특정 임계값에 도달할 때마다 포드의 수를 조절합니다. 예를 들어, 노드 수가 특정 구간에 도달할 때마다 포드 수를 증가시키는 방식입니다.

CPA의 장점

- 클러스터의 동적 변화에 유연한 대응: 클러스터의 변화에 따라 애플리케이션을 자동으로 스케일링하여 리소스 사용의 효율성을 극대화할 수 있습니다.

- 자원 낭비 방지: 필요 이상의 포드를 실행하지 않음으로써 리소스 낭비를 방지합니다.

- 성능 유지: 애플리케이션의 성능을 클러스터의 크기에 맞게 유지할 수 있습니다.

CPA는 Kubernetes 클러스터의 리소스 관리를 보다 효율적으로 만들어주는 강력한 도구입니다. 클러스터의 크기와 리소스 사용 패턴에 맞춰 애플리케이션을 자동으로 스케일링하여, 성능과 리소스 효율성을 최적화할 수 있습니다.

CPA 설치

helm repo add cluster-proportional-autoscaler https://kubernetes-sigs.github.io/cluster-proportional-autoscaler

# CPA규칙을 설정하고 helm차트를 릴리즈 필요
helm upgrade --install cluster-proportional-autoscaler cluster-proportional-autoscaler/cluster-proportional-autoscaler

해당메세지가 발생합니다.

helm 차트 디폴트 values는 CPA규칙이 없기때문에, 아무런 규칙을 설정하지 않으면 helm 차트가 릴리즈 되지 않습니다.

다음 [예제 챕터]에서 CPA규칙을 설정하고 helm차트를 릴리즈해보겠습니다.

Nginx 디플로이먼트 배포

cat <<EOT > cpa-nginx.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        resources:
          limits:
            cpu: "100m"
            memory: "64Mi"
          requests:
            cpu: "100m"
            memory: "64Mi"
        ports:
        - containerPort: 80
EOT
kubectl apply -f cpa-nginx.yaml

예제는 노드 개수에 따라 nginx pod개수를 설정합니다. nginx는 deployment로 배포합니다.

replica는 1개로 설정합니다.

CPA규칙

cat <<EOF > cpa-values.yaml
config:
  ladder:
    nodesToReplicas:
      - [1, 1]
      - [2, 2]
      - [3, 3]
      - [4, 3]
      - [5, 5]
options:
  namespace: default
  target: "deployment/nginx-deployment"
EOF
kubectl describe cm cluster-proportional-autoscaler

helm 차트 values에서 config와 options필드로 설정합니다. options필드에 CPA를 적용할 쿠버네티스 리소스를 설정합니다. config 필드에는 CPA정책과 노드개수에 따른 pod개수를 설정합니다.

규칙

노드개수 1개 -> nginx pod 1개 실행

노드개수 2개 -> nginx pod 2개 실행

helm upgrade --install cluster-proportional-autoscaler \
	-f cpa-values.yaml \
    cluster-proportional-autoscaler/cluster-proportional-autoscaler

helm차트를 릴리즈하여 CPA를 설치합니다. 이전 단계에서 만든 CPA규칙을 오버라이딩합니다.

노드 5개로 증가

export ASG_NAME=$(aws autoscaling describe-auto-scaling-groups --query "AutoScalingGroups[? Tags[? (Key=='eks:cluster-name') && Value=='myeks']].AutoScalingGroupName" --output text)
aws autoscaling update-auto-scaling-group --auto-scaling-group-name ${ASG_NAME} --min-size 5 --desired-capacity 5 --max-size 5
aws autoscaling describe-auto-scaling-groups --query "AutoScalingGroups[? Tags[? (Key=='eks:cluster-name') && Value=='myeks']].[AutoScalingGroupName, MinSize, MaxSize,DesiredCapacity]" --output table

위 명령으로 ASG의 워커 노드 갯수를 5개로 늘렸고, kube-ops-view에서 확인한 노드의 갯수는 다섯개입니다.

이에따라 파드의 갯수도 다섯개임을 확인할 수 있습니다.

노드 4개로 축소

aws autoscaling update-auto-scaling-group --auto-scaling-group-name ${ASG_NAME} --min-size 4 --desired-capacity 4 --max-size 4
aws autoscaling describe-auto-scaling-groups --query "AutoScalingGroups[? Tags[? (Key=='eks:cluster-name') && Value=='myeks']].[AutoScalingGroupName, MinSize, MaxSize,DesiredCapacity]" --output table

desired pod갯수가 4개임을 확인할 수 있습니다.

노드의 갯수가 4개가 되었음을 확인할 수 있고, cpa-value의 내용에 따라 노드가 4개일때의 파드의 갯수가3개가 되었음을 확인할 수 있습니다.

클린업

 helm uninstall cluster-proportional-autoscaler && kubectl delete -f cpa-nginx.yaml

Karpenter

Karpenter는 Kubernetes 클러스터에서 컴퓨팅 리소스를 효율적으로 관리하고 프로비저닝하는 오픈소스 프로젝트입니다. 이를 통해 클러스터의 작업 부하에 따라 자동으로 노드를 추가하거나 제거할 수 있습니다.

Karpenter의 주요 기능과 작동 방식을 아래와 같이 분류하여 설명합니다.

주요 기능

1. Pod 감시: 스케줄링이 불가능한 Pod를 감시합니다.

2. 스케줄링 제약 조건 판단: Pod가 요구하는 스케줄링 제약 조건(리소스 요청, 노드 셀렉터, 어피니티, 톨러레이션, 토폴로지 등)을 분석합니다.

3. Node Provisioning: Pod의 요구 사항을 충족하는 노드를 프로비저닝합니다.

4. 스케줄링: 프로비저닝된 노드에서 Pod를 실행하도록 스케줄링합니다.

5. Node Deprovisioning: 더 이상 필요하지 않은 노드를 제거합니다.

작동 방식

- 리소스 관리: 스케줄링되지 않은 Pod를 발견하면 해당 Pod의 요구 사항을 평가하여 적절한 노드를 프로비저닝합니다. 또한 사용하지 않는 노드를 자동으로 제거합니다.

- Provisioner 리소스: Cluster Autoscaler(CA)가 Auto Scaling Group(ASG)을 사용하는 반면, Karpenter는 Provisioner라는 Kubernetes 리소스를 사용하여 동작합니다. 이는 Kubernetes의 커스텀 리소스로, ArgoCD나 Spinnaker와 같은 도구를 통해 배포할 수 있습니다.

- 간소화된 구성: 시작 템플릿이 필요 없으며, 보안 그룹과 서브넷 설정은 필수이지만 태그 입력이나 리소스 ID 제공을 통해 쉽게 설정할 수 있습니다.

- 가드레일 방식 인스턴스 타입 설정: 스팟, 온디맨드 등 다양한 인스턴스 유형을 선언적으로 설정할 수 있습니다.

- 비용 효율적인 인스턴스 선택: Pod에 적합하며 가장 저렴한 인스턴스를 선택하여 프로비저닝합니다.

- 리소스 최적화: 노드를 줄일 필요가 있을 때 다른 노드에 충분한 여유가 있다면 자동으로 할당을 조정하고, 비용 절감을 위해 크기가 큰 단일 노드로 합칠 수 있습니다.

장점

- ASG 대비 유연성: ASG를 사용하지 않고 직접 EC2 인스턴스를 활용할 수 있는 능력은 Karpenter의 가장 큰 장점 중 하나입니다. 이는 노드 관리를 보다 유연하게 하여 클러스터의 효율성을 높이는 데 기여합니다.

Karpenter는 Kubernetes 클러스터의 자원 관리를 자동화하고 최적화하여 관리 오버헤드를 줄이고 비용 효율성을 높이는 데 중점을 둡니다. 이러한 기능은 특히 대규모 또는 변동성이 큰 워크로드를 다루는 환경에서 매우 유용합니다.

Karpenter 실습

https://karpenter.sh/docs/getting-started/getting-started-with-karpenter/ 해당 링크를 따라서 해봅니다.

aws sts get-caller-identity.

다음 명령으로 EKS 클러스터를 생성할 수 있는 충분한 권한이 있는 사용자로 CLI가 제대로 인증할 수 있는지 확인합니다.

export KARPENTER_NAMESPACE="kube-system"
export KARPENTER_VERSION="0.35.4"
export K8S_VERSION="1.29"

Karpenter 및 Kubernetes 버전을 설정합니다.

export AWS_PARTITION="aws" # if you are not using standard partitions, you may need to configure to aws-cn / aws-us-gov
export CLUSTER_NAME="${USER}-karpenter-demo"
export AWS_DEFAULT_REGION="us-west-2"
export AWS_ACCOUNT_ID="$(aws sts get-caller-identity --query Account --output text)"
export TEMPOUT="$(mktemp)"
export ARM_AMI_ID="$(aws ssm get-parameter --name /aws/service/eks/optimized-ami/${K8S_VERSION}/amazon-linux-2-arm64/recommended/image_id --query Parameter.Value --output text)"
export AMD_AMI_ID="$(aws ssm get-parameter --name /aws/service/eks/optimized-ami/${K8S_VERSION}/amazon-linux-2/recommended/image_id --query Parameter.Value --output text)"
export GPU_AMI_ID="$(aws ssm get-parameter --name /aws/service/eks/optimized-ami/${K8S_VERSION}/amazon-linux-2-gpu/recommended/image_id --query Parameter.Value --output text)"

환경 변수를 설정합니다.

echo "${KARPENTER_NAMESPACE}" "${KARPENTER_VERSION}" "${K8S_VERSION}" "${CLUSTER_NAME}" "${AWS_DEFAULT_REGION}" "${AWS_ACCOUNT_ID}" "${TEMPOUT}" "${ARM_AMI_ID}" "${AMD_AMI_ID}" "${GPU_AMI_ID}"

환경변수에 값이 잘 들어가있는지 확인해줍니다 .

curl -fsSL https://raw.githubusercontent.com/aws/karpenter-provider-aws/v"${KARPENTER_VERSION}"/website/content/en/preview/getting-started/getting-started-with-karpenter/cloudformation.yaml  > "${TEMPOUT}" \
&& aws cloudformation deploy \
  --stack-name "Karpenter-${CLUSTER_NAME}" \
  --template-file "${TEMPOUT}" \
  --capabilities CAPABILITY_NAMED_IAM \
  --parameter-overrides "ClusterName=${CLUSTER_NAME}"

EKS 배포하기 위한 IAM Policy, Role을 생성해줍니다.

eksctl create cluster -f - <<EOF
---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: ${CLUSTER_NAME}
  region: ${AWS_DEFAULT_REGION}
  version: "${K8S_VERSION}"
  tags:
    karpenter.sh/discovery: ${CLUSTER_NAME}

iam:
  withOIDC: true
  podIdentityAssociations:
  - namespace: "${KARPENTER_NAMESPACE}"
    serviceAccountName: karpenter
    roleName: ${CLUSTER_NAME}-karpenter
    permissionPolicyARNs:
    - arn:${AWS_PARTITION}:iam::${AWS_ACCOUNT_ID}:policy/KarpenterControllerPolicy-${CLUSTER_NAME}

## Optionally run on fargate or on k8s 1.23
# Pod Identity is not available on fargate
# https://docs.aws.amazon.com/eks/latest/userguide/pod-identities.html
# iam:
#   withOIDC: true
#   serviceAccounts:
#   - metadata:
#       name: karpenter
#       namespace: "${KARPENTER_NAMESPACE}"
#     roleName: ${CLUSTER_NAME}-karpenter
#     attachPolicyARNs:
#     - arn:${AWS_PARTITION}:iam::${AWS_ACCOUNT_ID}:policy/KarpenterControllerPolicy-${CLUSTER_NAME}
#     roleOnly: true

iamIdentityMappings:
- arn: "arn:${AWS_PARTITION}:iam::${AWS_ACCOUNT_ID}:role/KarpenterNodeRole-${CLUSTER_NAME}"
  username: system:node:{{EC2PrivateDNSName}}
  groups:
  - system:bootstrappers
  - system:nodes
  ## If you intend to run Windows workloads, the kube-proxy group should be specified.
  # For more information, see https://github.com/aws/karpenter/issues/5099.
  # - eks:kube-proxy-windows

managedNodeGroups:
- instanceType: m5.large
  amiFamily: AmazonLinux2
  name: ${CLUSTER_NAME}-ng
  desiredCapacity: 2
  minSize: 1
  maxSize: 10

addons:
- name: eks-pod-identity-agent

## Optionally run on fargate
# fargateProfiles:
# - name: karpenter
#  selectors:
#  - namespace: "${KARPENTER_NAMESPACE}"
EOF

EKS cluster 배포를 해줍니다.

tags:
    karpenter.sh/discovery: ${CLUSTER_NAME}

karpenter.sh/discovery: ${CLUSTER_NAME} 태그는 Karpenter가 클러스터 내에서 자동으로 스케일링할 수 있는 리소스를 식별하고 발견하는 데 사용되는 중요한 메커니즘입니다.

1. 클러스터 식별: 여러 EKS 클러스터가 동일한 AWS 계정 내에서 실행될 수 있습니다. karpenter.sh/discovery 태그는 Karpenter가 클러스터를 정확하게 식별하고, 해당 클러스터에 리소스를 프로비저닝하는 데 필요한 정보를 제공합니다.

2. 리소스 할당 최적화: 이 태그를 사용함으로써 Karpenter는 클러스터의 요구 사항에 따라 노드를 동적으로 추가하거나 제거할 수 있습니다. 예를 들어, 클러스터에서 실행되는 워크로드의 수가 증가하면 Karpenter는 추가 리소스가 필요하다고 판단하고 자동으로 노드를 추가할 수 있습니다.

3. 효율적인 스케일링: karpenter.sh/discovery 태그는 Karpenter가 클러스터의 현재 상태와 필요에 따라 노드를 효율적으로 스케일링하는 데 필요한 컨텍스트를 제공합니다. 이는 자동 스케일링 결정을 내리는 데 필요한 데이터를 Karpenter에 제공함으로써 가능합니다.

클라우드 포메이션에서 해당 네임으로 스택 생성된 것 확인가능합니다.

34분에 시작해서 50분에 끝났으니 약 16분이 걸립니다.

Karpenter 설치 with helm

helm install karpenter oci://public.ecr.aws/karpenter/karpenter --version "${KARPENTER_VERSION}" --namespace "${KARPENTER_NAMESPACE}" --create-namespace \
  --set "serviceAccount.annotations.eks\.amazonaws\.com/role-arn=${KARPENTER_IAM_ROLE_ARN}" \
  --set "settings.clusterName=${CLUSTER_NAME}" \
  --set "settings.interruptionQueue=${CLUSTER_NAME}" \
  --set controller.resources.requests.cpu=1 \
  --set controller.resources.requests.memory=1Gi \
  --set controller.resources.limits.cpu=1 \
  --set controller.resources.limits.memory=1Gi \
  --wait

helm으로 karpenter를 설치 해줍니다.

Node Pool 생성

cat <<EOF | envsubst | kubectl apply -f -
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["c", "m", "r"]
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["2"]
      nodeClassRef:
        apiVersion: karpenter.k8s.aws/v1beta1
        kind: EC2NodeClass
        name: default
  limits:
    cpu: 1000
  disruption:
    consolidationPolicy: WhenUnderutilized
    expireAfter: 720h # 30 * 24h = 720h
---
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiFamily: AL2 # Amazon Linux 2
  role: "KarpenterNodeRole-${CLUSTER_NAME}" # replace with your cluster name
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: "${CLUSTER_NAME}" # replace with your cluster name
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: "${CLUSTER_NAME}" # replace with your cluster name
  amiSelectorTerms:
    - id: "${ARM_AMI_ID}"
    - id: "${AMD_AMI_ID}"
#   - id: "${GPU_AMI_ID}" # <- GPU Optimized AMD AMI 
#   - name: "amazon-eks-node-${K8S_VERSION}-*" # <- automatically upgrade when a new AL2 EKS Optimized AMI is released. This is unsafe for production workloads. Validate AMIs in lower environments before deploying them to production.
EOF

spec.disruptions 설정은 expireAfter 이후에 Karpenter Instance를 삭제하고 최신의 Instance를 유지합니다.

EC2 NodeClass는 tag로 설정한 karpenter.sh/discovery 값이 있는 서브넷과 보안그룹을 가져오고, 환경 변수로 설정한 AMI의 ID를 사용하여 EC2 Instance를 생성합니다.

kubectl get nodepool,ec2nodeclass

생성 확인해줍니다.

Scale up deployment

cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: inflate
spec:
  replicas: 0
  selector:
    matchLabels:
      app: inflate
  template:
    metadata:
      labels:
        app: inflate
    spec:
      terminationGracePeriodSeconds: 0
      containers:
        - name: inflate
          image: public.ecr.aws/eks-distro/kubernetes/pause:3.7
          resources:
            requests:
              cpu: 1
EOF

Pod 1개에 CPU 1개를 보장하여 할당하는 내용의 pause 명령어를 실행하는 Pod를 배포합니다.

현재 사용하는 노드는 m5.large클래스인데, 2cpu를 가지고있습니다.

kubectl scale deployment inflate --replicas 5

Pod의 수를 5개로 늘리는 명령을 해보겠습니다.

Pod 1개당 1 core CPU를 사용하기 때문에 Node의 리소스가 부족하여 새로운 노드를 실행하고 해당 Pod를 Provisioning 시킵니다.

AWS EC2 콘솔에서 확인하니 새로 실행하는 인스턴스가 하나 생깁니다.

자세히 확인해보니 수명주기는 spot으로 생성되어있습니다.

cd ~/go/bin && ./eks-node-viewer --resources cpu,memory

해당 spot Node가 스케줄링 가능한 상태가 되었고, Pod들이 pending 된 것 없이정상적으로 할당된 것을 확인하실 수 있습니다.

kubectl delete deployment inflate && date

위 명령을한지 20초도안되어서 새로 스케쥴링 되었던 노드가 지워졌습니다. 정말 빠르네요

Disruption 전략은 클라우드 인프라와 관련된 최적화 접근 방식을 말하며, 특히 클라우드 리소스의 관리 및 비용 최적화에 중점을 둡니다.

Expiration, Drift, Consolidation의 세 가지 주요 유형이 있으며, 각각은 클라우드 리소스를 다루는 고유한 방식을 제공합니다.

Expiration (만료)

- 목적: 클라우드 인스턴스가 일정 기간(예: 30일) 동안 사용된 후 자동으로 만료되어 제거되도록 설정하는 것입니다.

- 장점: 이를 통해 클러스터 내의 노드가 최신 상태를 유지할 수 있으며, 보안 패치나 중요한 시스템 업데이트가 적용된 새 인스턴스로 교체될 수 있습니다.

Drift (드리프트)

- 목적: 클러스터의 구성 변경(예: NodePool이나 EC2NodeClass 변경)을 자동으로 감지하고 적용하여 클러스터 구성을 최신 상태로 유지하는 것입니다.

- 장점: 시스템의 변경 사항이나 업그레이드가 발생할 때, 클러스터가 자동으로 이를 반영하여 최적의 구성을 유지할 수 있습니다.

Consolidation (통합)

- 목적: 사용되지 않는 리소스를 줄이고, 여러 작은 인스턴스를 크고 비용 효율적인 인스턴스로 통합하여 클러스터의 비용을 최적화하는 것입니다.

- 장점: 부하가 적은 시간대에 노드의 수를 줄이거나, 비용 효율적인 리소스 사용을 통해 전체적인 클라우드 비용을 절감할 수 있습니다.

Karpenter와 스팟 인스턴스

- 스팟 인스턴스 활용: Karpenter는 비용 최적화를 위해 스팟 인스턴스를 활발히 사용합니다. 스팟 인스턴스는 사용 가능한 컴퓨팅 용량을 시장 가격에 맞춰 제공하여, 비용을 크게 절감할 수 있게 합니다.

- 자동화된 인스턴스 선택: Karpenter는 AWS EC2 Fleet Instance API를 사용하여, 클러스터의 요구 사항과 NodePool 구성에 기반한 최적의 인스턴스 유형을 자동으로 선택하고 프로비저닝합니다.

- 비용 효율성 및 유연성: Karpenter의 사용은 클라우드 리소스의 비용 효율성과 유연성을 높여주며, 자동화된 프로비저닝을 통해 인프라 관리의 복잡성을 줄여줍니다.

이러한 Disruption 전략과 Karpenter의 기능은 클라우드 리소스 관리의 자동화와 최적화를 추구하는 조직에 매우 유용합니다. 클라우드 비용을 관리하고, 인프라를 최신 상태로 유지하며, 시스템 변경에 유연하게 대응할 수 있는 효과적인 방법을 제공합니다.

helm upgrade karpenter -n kube-system oci://public.ecr.aws/karpenter/karpenter --reuse-values --set settings.featureGates.spotToSpotConsolidation=true

v0.34.0부터 featureGates에 spotToSpotConsolidation 활성화 사용 가능합니다.

NodePool과 EC2 NodeClass 배포

kubectl delete nodepool,ec2nodeclass default

기존 노드풀을 삭제해줍니다.

cat <<EOF > nodepool.yaml
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: default
spec:
  template:
    metadata:
      labels:
        intent: apps
    spec:
      nodeClassRef:
        name: default
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["c","m","r"]
        - key: karpenter.k8s.aws/instance-size
          operator: NotIn
          values: ["nano","micro","small","medium"]
        - key: karpenter.k8s.aws/instance-hypervisor
          operator: In
          values: ["nitro"]
  limits:
    cpu: 100
    memory: 100Gi
  disruption:
    consolidationPolicy: WhenUnderutilized
---
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiFamily: Bottlerocket
  subnetSelectorTerms:          
    - tags:
        karpenter.sh/discovery: "${CLUSTER_NAME}" # replace with your cluster name
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: "${CLUSTER_NAME}" # replace with your cluster name
  role: "KarpenterNodeRole-${CLUSTER_NAME}" # replace with your cluster name
  tags:
    Name: karpenter.sh/nodepool/default
    IntentLabel: "apps"
EOF
kubectl apply -f nodepool.yaml

노드풀과 ec2 노드클래스를 배포합니다.

파드 배포

cat <<EOF > inflate.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: inflate
spec:
  replicas: 5
  selector:
    matchLabels:
      app: inflate
  template:
    metadata:
      labels:
        app: inflate
    spec:
      nodeSelector:
        intent: apps
      containers:
        - name: inflate
          image: public.ecr.aws/eks-distro/kubernetes/pause:3.2
          resources:
            requests:
              cpu: 1
              memory: 1.5Gi
EOF
kubectl apply -f inflate.yaml

cpu와 메모리가 지정되어있는 inflate 파드를 배포해줍니다.

또 제일 밑에 새로운 노드가 생성된것을 확인할 수 있습니다.

kubectl scale --replicas=1 deployment/inflate

레플리카 수를 기존 다섯개에서 한개로 조정합니다.

m6g.large 클래스 인스턴스로 조정되어서 실행되는것 확인할 수 있습니다.

JUNE .

20'S LIFE IN SYDNEY and BUSAN

[AEWS 2기] EKS AutoScaling

about me

tag cloud

recent comments

archive

link

counter