Skip to content

DNS处理不正确 #2540

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
skyhhjmk opened this issue Apr 23, 2025 · 3 comments
Open

DNS处理不正确 #2540

skyhhjmk opened this issue Apr 23, 2025 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@skyhhjmk
Copy link

What is version of KubeKey has the issue?

kk version: &version.Info{Major:"3", Minor:"1", GitVersion:"v3.1.8", GitCommit:"dbb1ee4aa1ecf0586565ff3374427d8a7d9b327b", GitTreeState:"clean", BuildDate:"2025-03-26T04:49:07Z", GoVersion:"go1.19.2", Compiler:"gc", Platform:"linux/amd64"}

What is your os environment?

Ubuntu 22.04

KubeKey config file

apiVersion: kubekey.kubesphere.io/v1alpha2
kind: Cluster
metadata:
  name: xxx
spec:
  hosts:
  - {name: node1, address: 10.111.0.1, internalAddress: 10.111.0.1, privateKeyPath: "/root/pri-key"}
  - {name: node2, address: 10.111.0.2, internalAddress: 10.111.0.2, privateKeyPath: "/root/pri-key"}
  # - {name: node3, address: 10.111.0.3, internalAddress: 10.111.0.3, privateKeyPath: "/root/pri-key"}
  roleGroups:
    etcd:
    - node1
    control-plane: 
    - node1
    worker:
    - node1
    - node2
    # - node3
  controlPlaneEndpoint:
    ## Internal loadbalancer for apiservers 
    # internalLoadbalancer: haproxy

    domain: lb.kubesphere.local
    address: ""
    port: 6443
  system:
    ntpServers:
      - time1.cloud.tencent.com
      - ntp.aliyun.com
    timezone: "Asia/Shanghai"
  kubernetes:
    version: v1.28.15
    clusterName: xxx.com
    autoRenewCerts: true
    containerManager: containerd
  etcd:
    type: kubekey
  network:
    plugin: calico
    kubePodsCIDR: 10.233.64.0/18
    kubeServiceCIDR: 10.233.0.0/18
    ## multus support. https://github.com/k8snetworkplumbingwg/multus-cni
    multusCNI:
      enabled: false
  registry:
    privateRegistry: ""
    auths:
      "reg.xxx.com":
        username: "xxx"
        password: "xxx"
    namespaceOverride: ""
    registryMirrors: []
    insecureRegistries: []
  addons: []

A clear and concise description of what happend.

在使用kk安装完成k8s集群后,不进行任何操作,等待所有pod就绪并且稳定运行一段时间后使用reboot命令重启服务器,重启后无法正常使用kubectl get nodes等命令获取集群信息,排查问题为DNS错误,没有将lb.kubesphere.local写入hosts或没有配置正确的本地DNS,如果修改kube config配置文件,将域名改为控制平面节点的IP地址,那么可以正常获取到nodes信息

Relevant log output

root@node1:~# kubectl get nodes
Unable to connect to the server: dial tcp: lookup lb.kubesphere.local on 127.0.0.53:53: server misbehaving
root@node1:~# nslookup lb.kubesphere.local
;; Got SERVFAIL reply from 127.0.0.53
Server:         127.0.0.53
Address:        127.0.0.53#53

** server can't find lb.kubesphere.local: SERVFAIL

root@node1:~# dig lb.kubesphere.local

; <<>> DiG 9.18.30-0ubuntu0.22.04.2-Ubuntu <<>> lb.kubesphere.local
;; global options: +cmd
;; Got answer:
;; WARNING: .local is reserved for Multicast DNS
;; You are currently testing what happens when an mDNS query is leaked to DNS
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 42177
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;lb.kubesphere.local.           IN      A

;; Query time: 0 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Wed Apr 23 17:47:59 CST 2025
;; MSG SIZE  rcvd: 48

root@node1:~# cat /etc/hosts
# Your system has configured 'manage_etc_hosts' as True.
# As a result, if you wish for changes to this file to persist
# then you will need to either
# a.) make changes to the master file in /etc/cloud/templates/hosts.debian.tmpl
# b.) change or remove the value of 'manage_etc_hosts' in
#     /etc/cloud/cloud.cfg or cloud-config from user-data
#
127.0.1.1 kube-node1 kube-node1
127.0.0.1 localhost

# The following lines are desirable for IPv6 capable hosts
::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

Additional information

No response

@skyhhjmk skyhhjmk added the bug Something isn't working label Apr 23, 2025
@skyhhjmk
Copy link
Author

我再次尝试重新安装,但是这次我使用了我已注册的域名,并且在cloudflare上添加了解析,当我安装完成集群并且查看hosts文件时,我发现内容是这样的:

# Your system has configured 'manage_etc_hosts' as True.
# As a result, if you wish for changes to this file to persist
# then you will need to either
# a.) make changes to the master file in /etc/cloud/templates/hosts.debian.tmpl
# b.) change or remove the value of 'manage_etc_hosts' in
#     /etc/cloud/cloud.cfg or cloud-config from user-data
#
127.0.1.1      node1
127.0.0.1 localhost

# The following lines are desirable for IPv6 capable hosts
::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

# kubekey hosts BEGIN
10.111.0.1  node1.xxx.com node1
10.111.0.2  node2.xxx.com node2
10.111.0.1  lb.kube.xxx.com
# kubekey hosts END

也就是说,hosts文件在重启后被还原了,我不清楚这是否和kubekey有关,但我认为kubekey应该做些措施来防止hosts文件中的地址被修改

@skyhhjmk
Copy link
Author

我先前没有注意到hosts文件的头部注释内容,我检查时理解了含义,明白了是由于cloud init引起的,cloud init会在重启时自动配置hosts文件,但是会覆盖掉原有配置,所以我建议kubekey检测到存在cloud init时在修改hosts的同时也要修改/etc/cloud/templates/hosts.debian.tmpl或者/etc/cloud/cloud.cfg

@redscholar redscholar self-assigned this Apr 24, 2025
@redscholar
Copy link
Collaborator

好想法,后续考虑把目标的/etc/hosts文件做成可配置的变量

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants