본문 바로가기
기타

[Elasticsearch] Elasticsearch 7.16.1 (Kubernetes readiness failed )

by guru_k 2021. 12. 13.
728x90
반응형

CVE-2021-44228 관련하여 elasticsearch에도 취약점이 발견되었고 ES에서 elasticsearch 6,7 버전에는 Security Manager 를 사용하여 Java Security Manager 사용으로 직접적 취약점은 없다고 하였지만 Dec 13 에 해당 패치 버전인 6.8.21 or 7.16.1 를 릴리즈 하였다. (참고: https://discuss.elastic.co/t/apache-log4j2-remote-code-execution-rce-vulnerability-cve-2021-44228-esa-2021-31/291476)

이에 현재 사용하고 있는 7.6.x 버전의 elasticsearch 를 7.16.1로 업그레이드 진행

helm chart는 helm repo에 있는 7.6.x 버전을 그대로 사용하였고 tag만 7.16.1로 변경하여 업그레이드를 진행하였으나 elasticsearch 서비스 동작 후 kubernetes readiness probe 에 실패하는 현상이 발생.

아래와 같이 cluster status 는 green으로 변경되었지만 readiness 에 계속적으로 실패하고 있었음

{"type": "server", "timestamp": "2021-12-13T14:27:16,001Z", "level": "INFO", "component": "o.e.c.r.a.AllocationService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-master-0", "message": "Cluster health status changed from [RED] to [GREEN] (reason: [shards started [[.ds-.logs-deprecation.elasticsearch-default-2021.12.13-000001][0], [.ds-ilm-history-5-2021.12.13-000001][0], [cattle-monitoring-system-2021.12.13][0]]]).", "cluster.uuid": "-VlDd9dRQs2vT6QPjJG-2g", "node.id": "_zIcXZgzQg2ZjmrSGFj9Qw"  }

kubernetes pod describe 를 통해 Events log를 조회했을 때 아래와 같은 Shell Script 에러가 발생

Events:
  Type     Reason     Age               From                         Message
  ----     ------     ----              ----                         -------
  Normal   Scheduled  71s               default-scheduler            Successfully assigned kube-system/elasticsearch-master-0 to apseo-centraltech1
  Normal   Pulled     71s               kubelet, apseo-centraltech1  Container image "elastic.co/elasticsearch/elasticsearch:7.16.1" already present on machine
  Normal   Created    70s               kubelet, apseo-centraltech1  Created container
  Normal   Started    70s               kubelet, apseo-centraltech1  Started container
  Normal   Pulling    70s               kubelet, apseo-centraltech1  pulling image "busybox"
  Normal   Pulled     67s               kubelet, apseo-centraltech1  Successfully pulled image "busybox"
  Normal   Created    67s               kubelet, apseo-centraltech1  Created container
  Normal   Started    66s               kubelet, apseo-centraltech1  Started container
  Normal   Pulled     66s               kubelet, apseo-centraltech1  Container image "elastic.co/elasticsearch/elasticsearch:7.16.1" already present on machine
  Normal   Created    66s               kubelet, apseo-centraltech1  Created container
  Normal   Started    66s               kubelet, apseo-centraltech1  Started container
  Warning  Unhealthy  9s (x2 over 19s)  kubelet, apseo-centraltech1  Readiness probe failed: Elasticsearch is already running, lets check the node is healthy
curl -XGET -s -k ${BASIC_AUTH} -o /dev/null -w '%{http_code}' http://127.0.0.1:9200/ failed with HTTP code 200
sh: 17: [[: not found
sh: 22: [[: not found
sh: 24: [[: not found

로그를 보면 HTTP CODE도 200으로 정상리턴되는 상황이었으나 sh script 에러로 인해서 readiness check가 정상적으로 되지 않는 상황

차트안에 있는 readiness 에서 sh -> bash 로 변경 후 재실행해서 해결

# elasticsearch/templates/statefulset.yaml

...
        imagePullPolicy: "{{ .Values.imagePullPolicy }}"
        readinessProbe:
          exec:
            command:
              - bash           // sh -> bash 로 변경
              - -c
              - |
                #!/usr/bin/env bash -e
                # If the node is starting up wait for the cluster to be ready (request params: '{{ .Values.clusterHealthCheckParams }}' )
                # Once it has started only check that the node itself is responding
                START_FILE=/tmp/.es_start_file

                if [ -n "${ELASTIC_USERNAME}" ] && [ -n "${ELASTIC_PASSWORD}" ]; then
                  BASIC_AUTH="-u ${ELASTIC_USERNAME}:${ELASTIC_PASSWORD}"
                else
                  BASIC_AUTH=''
                fi

                if [ -f "${START_FILE}" ]; then
                  echo 'Elasticsearch is already running, lets check the node is healthy'
                  HTTP_CODE=$(curl -XGET -s -k ${BASIC_AUTH} -o /dev/null -w '%{http_code}' {{ .Values.protocol }}://127.0.0.1:{{ .Values.httpPort }}/)
                  RC=$?
                  if [[ ${RC} -ne 0 ]]; then
                    echo "curl -XGET -s -k \${BASIC_AUTH} -o /dev/null -w '%{http_code}' {{ .Values.protocol }}://127.0.0.1:{{ .Values.httpPort }}/ failed with RC ${RC}"
                    exit ${RC}
                  fi
                  # ready if HTTP code 200, 503 is tolerable if ES version is 6.x
                  if [[ ${HTTP_CODE} == "200" ]]; then
                    exit 0
                  elif [[ ${HTTP_CODE} == "503" && "{{ include "elasticsearch.esMajorVersion" . }}" == "6" ]]; then
                    exit 0
                  else
                    echo "curl -XGET -s -k \${BASIC_AUTH} -o /dev/null -w '%{http_code}' {{ .Values.protocol }}://127.0.0.1:{{ .Values.httpPort }}/ failed with HTTP code ${HTTP_CODE}"
                    exit 1
                  fi

                else
                  echo 'Waiting for elasticsearch cluster to become ready (request params: "{{ .Values.clusterHealthCheckParams }}" )'
                  if curl -XGET -s -k --fail ${BASIC_AUTH} {{ .Values.protocol }}://127.0.0.1:{{ .Values.httpPort }}/_cluster/health?{{ .Values.clusterHealthCheckParams }} ; then
                    touch ${START_FILE}
                    exit 0
                  else
                    echo 'Cluster is not yet ready (request params: "{{ .Values.clusterHealthCheckParams }}" )'
                    exit 1
                  fi
                fi
{{ toYaml .Values.readinessProbe | indent 10 }}
        ports:
        - name: http
...
728x90
반응형

댓글