[Kubernetes] 파드 간 통신 방법과 FQDN/DNS 요청 시 발생한 에러 처리

쿠버네티스 환경에서 Pod는 바인딩한 Service의 이름이나 Cluster-IP를 이용하여 다른 Pod와 통신을 할 수 있다.

해당 글에서는 파드 간 통신 방법과 해당 방법을 이용하는 중 발생한 에러 및 트러블 슈팅에 대하여 다룰 것이다.

1. 파드 간 통신

예시로, nginx라는 Pod가 존재하고,

[Pod]

# nginx-pod.yaml

apiVersion: v1
kind: Pod
metadata:
  labels:
    app: nginx
  name: nginx
spec:
  containers:
  - image: nginx:latest
    name: nginx
    resources: {}
  dnsPolicy: ClusterFirst
  restartPolicy: Always
status: {}

my-spring-boot라는 Pod와 Service가 아래와 같이 존재할 때,

[Pod]

# my-spring-boot-pod.yaml

apiVersion: v1
kind: Pod
metadata:
  labels:
    app: my-spring-boot
  name: my-spring-boot
spec:
  containers:
  - image: [docker-hub-id]/my-spring-boot:latest
    name: my-spring-boot
    resources: {}
  dnsPolicy: ClusterFirst
  restartPolicy: Always
  # 사용자 정의 Secret
  imagePullSecrets:
  - name: docker-secret
status: {}

[Service]

# my-spring-boot-service.yaml

apiVersion: v1
kind: Service
metadata:
  name: my-spring-boot-service
spec:
  selector:
    app: my-spring-boot
  ports:
    - protocol: TCP
      port: 8000
      targetPort: 8000

my-spring-boot의 Service가 정상적으로 엔드포인트를 생성했을 때,

nginx라는 Pod에서 my-spring-boot-service로 해당 Pod에 요청이 가능하다는 것이다.

통신을 하기 위한 방법은 크게 아래와 같은 2가지가 있다.

1-1. 서비스 Cluster IP를 이용한 요청

# 서비스 확인

kubectl get services

k get svc

위의 명령어를 실행했을 때, 아래와 같은 결과를 볼 수 있을 것이다.

# 출력 값
NAME               TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
my-spring-boot   ClusterIP   10.104.180.63   <none>        8000/TCP   3h5m
kubernetes       ClusterIP   10.96.0.1       <none>        443/TCP    10d

여기서 nginx라는 Pod에서 my-spring-boot라는 Pod에 요청을 보내기 위해서는 위에 출력된 Cluster-IP 값이 필요하고,

nginx 파드에서 이를 이용하여 요청을 보낼 수 있다.

# nginx pod에서 my-spring-boot로 curl 요청

kubectl exec nginx -it -- curl 10.104.180.63:8000/api

1-2. 서비스 이름을 이용한 요청

위를 기반으로 요청을 할 때, IP를 이용하여 요청하기 때문에 안정적이지 않다고 생각할 수 있다.

Private IP는 유동적이기 때문에 바뀔 수 있고, 손수 입력해 주기에는 서비스가 커질수록 노력의 낭비가 증가하기 때문이다.

그렇기 때문에 서비스 이름을 통하여 요청을 할 수 있는 방법이 존재한다.

쿠버네티스는 PQDN(Partially Qualified Domain Name)과 FQDN(Full Qualified Domain Name)을 지원하기 때문에 이를이용하여 요청하는 것이다.

명령어는 아래와 같다.

# nginx pod에서 my-spring-boot-service로 curl 요청 (같은 네임스페이스)

kubectl exec nginx -it -- curl my-spring-boot-service:8000/api



# nginx pod에서 my-spring-boot로 curl 요청 (다른 네임스페이스)

kubectl exec nginx -it -- curl my-spring-boot.[네임스페이스].svc.cluster.local:8000/api

위의 명령어를 통하여 my-spring-boot라는 이름을 이용하여 IP를 입력하지 않고도 동일한 요청을 할 수 있게 되는 것이다.

+ 물론 서비스의 IP로도 호출이 가능하다.

2. 에러 및 트러블 슈팅

처음에는 정상 적으로 요청이 가는 줄 알았으나, 간헐적으로 아래 에러가 발생했다.

요청을 하면 2~3번 성공하고, 10초가량 Timeout이 발생하는 상황이 반복적으로 일어났다.

# 에러

curl: (6) Could not resolve host: my-spring-boot
command terminated with exit code 6

이 부분에 대하여 트러블 슈팅을 3가지 방법을 시도했고 마지막 방법을 통해 해결했다.

2-1. /etc/resolv.conf 수정

에러에서 볼 수 있는 것처럼, host가 resolve가 되지 않는다는 것이고 resolv.conf를 수정해야 한다는 글을 많이 발견했다.

# bash

sudo vi /etc/resolv.conf



# ------------------------------------------------------------
# conf 파일 내 아래 내용 추가
nameserver 8.8.8.8
nameserver 8.8.4.4

이 방법은 본인에게 크게 도움이 되지는 않았다.

2-2. resolve 캐싱 삭제

말그대로 캐싱에서 문제가 발생했을 수도 있다는 글을 발견해서 시도해 보았으나, 큰 도움이 되지는 않았다.

캐싱을 삭제하는 방법은 2가지가 있다.

# bash

# 1번 방법
sudo resolvectl flush-caches

# 2번 방법
sudo systemd-resolve --flush-caches

2-3. Coredns 삭제 후 재설치

============== 아래 내용 진행 시 문제가 발생하면 클러스터 전체 재설치 위험이 존재 ==============

coredns에서 예상치 못한 에러를 발생할 수 있다는 글을 보고 coredns를 삭제하고 재설치를 하여 확인해 보니 요청이 정상적으로 실행되는 것을 확인함으로 문제를 해결했다.

먼저 deployments를 조회하고 삭제하는 부분이다.

# bash

# coredns 조회
kubectl get all -A | grep core

# coredns deploy 삭제
kubectl delete deploy coredns -n kube-system

coredns는 kubernetes 공식 깃허브의 coredns 파일을 참조하여 재설치를 진행했다.

https://github.com/coredns/deployment/blob/master/kubernetes/coredns.yaml.sed

아래는 coredns의 yaml이다. - 2023.08.21

# coredns.yaml

apiVersion: v1
kind: ServiceAccount
metadata:
  name: coredns
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    kubernetes.io/bootstrapping: rbac-defaults
  name: system:coredns
rules:
  - apiGroups:
    - ""
    resources:
    - endpoints
    - services
    - pods
    - namespaces
    verbs:
    - list
    - watch
  - apiGroups:
    - discovery.k8s.io
    resources:
    - endpointslices
    verbs:
    - list
    - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  annotations:
    rbac.authorization.kubernetes.io/autoupdate: "true"
  labels:
    kubernetes.io/bootstrapping: rbac-defaults
  name: system:coredns
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:coredns
subjects:
- kind: ServiceAccount
  name: coredns
  namespace: kube-system
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
data:
  Corefile: |
    .:53 {
        errors
        health {
          lameduck 5s
        }
        ready
        kubernetes CLUSTER_DOMAIN REVERSE_CIDRS {
          fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        forward . UPSTREAMNAMESERVER {
          max_concurrent 1000
        }
        cache 30
        loop
        reload
        loadbalance
    }STUBDOMAINS
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: coredns
  namespace: kube-system
  labels:
    k8s-app: kube-dns
    kubernetes.io/name: "CoreDNS"
    app.kubernetes.io/name: coredns
spec:
  # replicas: not specified here:
  # 1. Default is 1.
  # 2. Will be tuned in real time if DNS horizontal auto-scaling is turned on.
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
  selector:
    matchLabels:
      k8s-app: kube-dns
      app.kubernetes.io/name: coredns
  template:
    metadata:
      labels:
        k8s-app: kube-dns
        app.kubernetes.io/name: coredns
    spec:
      priorityClassName: system-cluster-critical
      serviceAccountName: coredns
      tolerations:
        - key: "CriticalAddonsOnly"
          operator: "Exists"
      nodeSelector:
        kubernetes.io/os: linux
      affinity:
         podAntiAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
           - labelSelector:
               matchExpressions:
               - key: k8s-app
                 operator: In
                 values: ["kube-dns"]
             topologyKey: kubernetes.io/hostname
      containers:
      - name: coredns
        image: coredns/coredns:1.9.4
        imagePullPolicy: IfNotPresent
        resources:
          limits:
            memory: 170Mi
          requests:
            cpu: 100m
            memory: 70Mi
        args: [ "-conf", "/etc/coredns/Corefile" ]
        volumeMounts:
        - name: config-volume
          mountPath: /etc/coredns
          readOnly: true
        ports:
        - containerPort: 53
          name: dns
          protocol: UDP
        - containerPort: 53
          name: dns-tcp
          protocol: TCP
        - containerPort: 9153
          name: metrics
          protocol: TCP
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            add:
            - NET_BIND_SERVICE
            drop:
            - all
          readOnlyRootFilesystem: true
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 60
          timeoutSeconds: 5
          successThreshold: 1
          failureThreshold: 5
        readinessProbe:
          httpGet:
            path: /ready
            port: 8181
            scheme: HTTP
      dnsPolicy: Default
      volumes:
        - name: config-volume
          configMap:
            name: coredns
            items:
            - key: Corefile
              path: Corefile
---
apiVersion: v1
kind: Service
metadata:
  name: kube-dns
  namespace: kube-system
  annotations:
    prometheus.io/port: "9153"
    prometheus.io/scrape: "true"
  labels:
    k8s-app: kube-dns
    kubernetes.io/cluster-service: "true"
    kubernetes.io/name: "CoreDNS"
    app.kubernetes.io/name: coredns
spec:
  selector:
    k8s-app: kube-dns
    app.kubernetes.io/name: coredns
  clusterIP: CLUSTER_DNS_IP
  ports:
  - name: dns
    port: 53
    protocol: UDP
  - name: dns-tcp
    port: 53
    protocol: TCP
  - name: metrics
    port: 9153
    protocol: TCP

'Devops > Kubernetes' 카테고리의 다른 글

[Kubernetes] 쿠버네티스 노드(Node) 및 파드(Pod) 사용 중 컴퓨터 자원(Resource) 확인 (0)	2023.08.30
[Kubernetes] bitnami/kafka (Helm Chart) SASL Authentication 에러 (2)	2023.08.24
[Kubernetes] 쿠버네티스 내 헬름 차트(Helm Chart)로 Apache Kafka 설치(bitnami) (2)	2023.08.22
[Kubernetes] 쿠버네티스 Ec2(Kubeadm) 환경에서 Master-Worker Node 구축 (1)	2023.08.11
[Kubernetes] The connection to the server [IP]:6443 was refused (0)	2023.08.07

S_Notebook

[Kubernetes] 파드 간 통신 방법과 FQDN/DNS 요청 시 발생한 에러 처리

1. 파드 간 통신

1-1. 서비스 Cluster IP를 이용한 요청

1-2. 서비스 이름을 이용한 요청

2. 에러 및 트러블 슈팅

2-1. /etc/resolv.conf 수정

2-2. resolve 캐싱 삭제

2-3. Coredns 삭제 후 재설치

'Devops > Kubernetes' 카테고리의 다른 글

티스토리툴바

[Kubernetes] 파드 간 통신 방법과 FQDN/DNS 요청 시 발생한 에러 처리

1. 파드 간 통신

1-1. 서비스 Cluster IP를 이용한 요청

1-2. 서비스 이름을 이용한 요청

2. 에러 및 트러블 슈팅

2-1. /etc/resolv.conf 수정

2-2. resolve 캐싱 삭제

2-3. Coredns 삭제 후 재설치

'Devops > Kubernetes' 카테고리의 다른 글

관련글

티스토리툴바